Check out this query. It's pretty simple (see the end of the post for table and index definitions, and a repro script):
SELECT MAX(Revision)
FROM dbo.TheOneders
WHERE Id = 1 AND 1 = (SELECT 1);
> Note: the "AND 1 = (SELECT 1) is just to keep this query from being auto-parameterized, which I felt like was confusing the issue - it actually gets the same plan with or without that clause though
And here's the plan ([paste the plan link)][1]:
[![plan with a stream agg][2]][2]
Since there is a "top 1" there, I was surprised to see the stream aggregate operator. It doesn't seem necessary to me, since there is guaranteed to only be one row.
To test that theory, I tried out this logically equivalent query:
SELECT MAX(Revision)
FROM dbo.TheOneders
WHERE Id = 1
GROUP BY Id;
Here's the plan for that one ([paste the plan link][4]):
[![plan without a stream agg][3]][3]
Sure enough, the group by plan is able to get by without the stream aggregate operator.
Notice that both queries read "backwards" from the end of the index and do a "top 1" to get the max revision.
What am I missing here? **Is the stream aggregate actually doing work in the first query, or should it be able to be eliminated (and it's just a limitation of the optimizer that it's not)?**
By the way, I realize this is not an incredibly practical problem (both queries report 0 ms of CPU and elapsed time), I'm just curious about the internals / behavior being exhibited here.
---
Here's the setup code I ran before running the two queries above:
DROP TABLE IF EXISTS dbo.TheOneders;
GO
CREATE TABLE dbo.TheOneders
(
Id INT NOT NULL,
Revision SMALLINT NOT NULL,
Something NVARCHAR(23),
CONSTRAINT PK_TheOneders PRIMARY KEY NONCLUSTERED (Id, Revision)
);
GO
INSERT INTO dbo.TheOneders
(Id, Revision, Something)
SELECT DISTINCT TOP 1000
1, m.message_id, 'Do...'
FROM sys.messages m
ORDER BY m.message_id
OPTION (MAXDOP 1);
INSERT INTO dbo.TheOneders
(Id, Revision, Something)
SELECT DISTINCT TOP 100
2, m.message_id, 'Do that thing you do...'
FROM sys.messages m
ORDER BY m.message_id
OPTION (MAXDOP 1);
GO
[1]: https://www.brentozar.com/pastetheplan/?id=HyQibq8gE
[2]: https://i.stack.imgur.com/jRU17.png
[3]: https://i.stack.imgur.com/3WQGs.png
[4]: https://www.brentozar.com/pastetheplan/?id=SkCN7qLxV
You can see the role of this aggregate if no rows match the `WHERE` clause.
SELECT MAX(Revision)
FROM dbo.TheOneders
WHERE Id = 1
AND 1 = 1 /*To avoid auto parameterisation*/
AND Id%3 = 4 /*always false*/
In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return `NULL` in this case.
[![enter image description here][1]][1]
This is a scalar aggregate as opposed to a vector one.
Your "logically equivalent" query is not equivalent. Adding `GROUP BY Id` would make it a vector aggregate and then the correct behaviour would be to return no rows.
See [Fun with Scalar and Vector Aggregates][2] for more about this.
[1]: https://i.stack.imgur.com/CSvmw.png
[2]: https://sqlkiwi.blogspot.com/2012/03/fun-with-aggregates.html