sql-server add tag
5 years ago sepupic (imported from SE)

Here we have two similar queries using grouping sets where the SELECT clause contains some expressions calculated in aggregation:

it’s plan is here: first query plan

and

the corresponding plan is here: second query plan

Both the queries first calculate some expression for aggregation, RN10 / 10 in the first case and SUBSTRING(RN,3,99) in the second, then the same expression is used in the SELECT clause but as the first plan shows it’s re-calculated in the first query and it’s not in the second.

As the result we have NULLs in the first result set that is quite unexpectedly:

results

Can someone explain why the first query makes the calculation 2 times (one in aggregation and one more time in the final select) while the second makes it one time only?

Top Answer
5 years ago Martin Smith (imported from SE)

I’m going to use a simpler example where it is clear to see what the expected results are.

Query 1

Query 1 Results

Query 2

Query 2 Results

Despite the ORDER BY SurnameInitial and the fact that NULL sorts first in SQL Server the rows with SurnameInitial as NULL are ordered last.

Query 1 and 2 should return the same results. The problem is that SQL Server decides to treat it like the following SQL

This just looks like a bug to me (trace flag 8605 shows that the damage is already done in the initial query tree representation). BUG REPORT.

Query 3

Query 3 Results

Query3 does not meet the problematic pattern of grouping on a column and an expression referencing that column. It wouldn’t even be possible for the same issue to occur here anyway because the grouping sets part is equivalent to

This does not pass out the entire FirstName column upstream (or even have a guaranteed unique FirstName column that could be passed out) so it isn’t possible for the LEFT(FirstName,1) expression to be calculated on top of that.

For the same reason you don’t see the issue with (RN10), (SUBSTRING(RN,3,99)).

@i-one reasons in the comments that it is likely

a bug in normalization (algebrization). It has logic that finds a match for non-aggregated columns and expressions in the SELECT list within members of GROUP BY. The same logic seemingly allows us to write for example

without having to explicitly add the calculated expression as below

Or another example would be

In this case the LEFT(Surname,2) is allowed and the only way of computing it would be to do it in the manner that is problematic for the LEFT(Surname,1) case.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.