or
sql-server sql-server-2017
Josh Darnell
Check out this query.  It's pretty simple (see the end of the post for table and index definitions, and a repro script):

    SELECT MAX(Revision)
    FROM dbo.TheOneders
    WHERE Id = 1 AND 1 = (SELECT 1);

> Note: the "AND 1 = (SELECT 1) is just to keep this query from being auto-parameterized, which I felt like was confusing the issue - it actually gets the same plan with or without that clause though

And here's the plan ([paste the plan link)][1]:

[![plan with a stream agg][2]][2]

Since there is a "top 1" there, I was surprised to see the stream aggregate operator.  It doesn't seem necessary to me, since there is guaranteed to only be one row.

To test that theory, I tried out this logically equivalent query:

    SELECT MAX(Revision)
    FROM dbo.TheOneders
    WHERE Id = 1
    GROUP BY Id;

Here's the plan for that one ([paste the plan link][4]):

[![plan without a stream agg][3]][3]

Sure enough, the group by plan is able to get by without the stream aggregate operator.

Notice that both queries read "backwards" from the end of the index and do a "top 1" to get the max revision.

What am I missing here?  **Is the stream aggregate actually doing work in the first query, or should it be able to be eliminated (and it's just a limitation of the optimizer that it's not)?**

By the way, I realize this is not an incredibly practical problem (both queries report 0 ms of CPU and elapsed time), I'm just curious about the internals / behavior being exhibited here.

---

Here's the setup code I ran before running the two queries above:

    DROP TABLE IF EXISTS dbo.TheOneders;
    GO
    
    CREATE TABLE dbo.TheOneders
    (
        Id INT NOT NULL,
        Revision SMALLINT NOT NULL,
        Something NVARCHAR(23),
    
        CONSTRAINT PK_TheOneders PRIMARY KEY NONCLUSTERED (Id, Revision)
    );
    GO
    
    INSERT INTO dbo.TheOneders
        (Id, Revision, Something)
    SELECT DISTINCT TOP 1000 
        1, m.message_id, 'Do...'
    FROM sys.messages m
    ORDER BY m.message_id
    OPTION (MAXDOP 1);
    
    INSERT INTO dbo.TheOneders
        (Id, Revision, Something)
    SELECT DISTINCT TOP 100 
        2, m.message_id, 'Do that thing you do...'
    FROM sys.messages m
    ORDER BY m.message_id
    OPTION (MAXDOP 1);
    GO


  [1]: https://www.brentozar.com/pastetheplan/?id=HyQibq8gE
  [2]: https://i.stack.imgur.com/jRU17.png
  [3]: https://i.stack.imgur.com/3WQGs.png
  [4]: https://www.brentozar.com/pastetheplan/?id=SkCN7qLxV
Top Answer
Martin Smith
You can see the role of this aggregate if no rows match the `WHERE` clause.

    SELECT MAX(Revision)
    FROM   dbo.TheOneders
    WHERE  Id = 1
           AND 1 = 1 /*To avoid auto parameterisation*/
           AND Id%3 = 4  /*always false*/

In that case zero rows go into the aggregate but it still emits one as the correct semantics are to return `NULL` in this case.

[![enter image description here][1]][1]

This is a scalar aggregate as opposed to a vector one. 

Your "logically equivalent" query is not equivalent. Adding `GROUP BY Id` would make it a vector aggregate and then the correct behaviour would be to return no rows.

See [Fun with Scalar and Vector Aggregates][2] for more about this.


  [1]: https://i.stack.imgur.com/CSvmw.png
  [2]: https://sqlkiwi.blogspot.com/2012/03/fun-with-aggregates.html
Why is this stream aggregate necessary?

This is a dedicated room for discussion about this question.

Once logged in you can direct comments to the question poster (or any answer poster) here.