Strange HAVING-clause behaviour

Say you want to return a resultset containing the last two days per week. First, construct the demo table:
CREATE TABLE TEST_TIME (WEEKCODE INT, ISODATE CHAR(8))
INSERT INTO TEST_TIME VALUES (1, '20060101')
INSERT INTO TEST_TIME VALUES (1, '20060102')
INSERT INTO TEST_TIME VALUES (1, '20060103')
INSERT INTO TEST_TIME VALUES (2, '20060108')
INSERT INTO TEST_TIME VALUES (2, '20060109')
INSERT INTO TEST_TIME VALUES (2, '20060110')
INSERT INTO TEST_TIME VALUES (2, '20060111')
The following SQL query will give the desired resultset:
SELECT t.WEEKCODE, t.ISODATE FROM TEST_TIME t
WHERE t.ISODATE >=
(
SELECT TOP 1 ISODATE FROM
(
SELECT TOP 2 ISODATE
FROM TEST_TIME t2
WHERE t.WEEKCODE = t2.WEEKCODE
ORDER BY ISODATE DESC
) t1
ORDER BY ISODATE ASC
)
ORDER BY t.WEEKCODE
It returns 4 records, as espected:
1 20060102
1 20060103
2 20060110
2 20060111
Just for fun, I tried the same thing using a GROUP BY and HAVING-clause like this:
SELECT t.WEEKCODE, t.ISODATE FROM TEST_TIME t
GROUP BY t.WEEKCODE, t.ISODATE
HAVING t.ISODATE >=
(
SELECT TOP 1 ISODATE FROM
(
SELECT TOP 2 ISODATE
FROM TEST_TIME t2
WHERE t.WEEKCODE = t2.WEEKCODE
ORDER BY ISODATE DESC
) t1
ORDER BY ISODATE ASC
)
ORDER BY t.WEEKCODE
Far less efficient, but one would expect it to return the same resultset. But it does not! In fact it returns 28 records:
1 20060102
1 20060102
1 20060102
1 20060102
1 20060102
1 20060102
1 20060102
1 20060103
1 20060103
1 20060103
1 20060103
1 20060103
1 20060103
1 20060103
2 20060110
2 20060110
2 20060110
2 20060110
2 20060110
2 20060110
2 20060110
2 20060111
2 20060111
2 20060111
2 20060111
2 20060111
2 20060111
2 20060111
That is even more than there are in the table! This is very strange behaviour. Stripping the HAVING-clause returns 7 records, as expected. But adding it results in a cartesian product of the 7 source records by the 4 top-2 records (2 for week 1 and 2 for week 2) resulting in 7x4=28 records.

From BOL:
"The HAVING clause is then applied to the rows in the result set that are produced by grouping. Only the groups that meet the HAVING conditions appear in the query output. You can apply a HAVING clause only to columns that also appear in the GROUP BY clause or in an aggregate function."

So, first the grouping, THEN an extra filtering on that set. How can filtering 7 records result in 28 records? Furthermore, the 28 records are not distinct records. Something you would expect when using a GROUP BY-clause...

Conclusion is that in case of a correlated having-clause query the having-clause is evaluated before the group by-clause.

No comments: