sql-server add tag
Paul White (imported from SE)
A question that arose in a chat discussion:

I know *hash join* bailout switches internally to a sort of nested loops thing.

What does SQL Server do for a *hash aggregate* bailout (if it can happen at all)?
Top Answer
Paul White (imported from SE)
*Hash join* and *hash aggregate* both use the same operator code internally, though a hash aggregate uses only a single (build) input. The basic operation of *hash aggregate* is [described by Craig Freedman][1]:

>As with hash join, the hash aggregate requires memory.  Before executing a query with a hash aggregate, SQL Server uses cardinality estimates to estimate how much memory we need to execute the query.  With a hash join, we store each build row, so the total memory requirement is proportional to the number and size of the build rows.  The number of rows that join and the output cardinality of the join have no effect on the memory requirement of the join.  With a hash aggregate, we store one row for each group, so the total memory requirement is actually proportional to the number and size of the output groups or rows.  If we have fewer unique values of the group by column(s) and fewer groups, we need less memory.  If we have more unique values of the group by column(s) and more groups, we need more memory.

He goes on to talk about hash recursion:

>So, what happens if we run out of memory?  Again, like hash join, if we run out of memory, we must begin spilling rows to tempdb.  We spill one or more buckets or partitions including any partially aggregated results along with any additional new rows that hash to the spilled buckets or partitions.  Although we do not attempt to aggregate the spilled new rows, we do hash them and divide them up into several buckets or partitions.  Once we’ve finished processing all input groups, we output the completed in-memory groups and repeat the algorithm by reading back and aggregating one spilled partition at a time.  By dividing the spilled rows into multiple partitions, we reduce the size of each partition and, thus, reduce the risk that the algorithm will need to repeat many times.

### Bailout

*Hash bailout* is lightly documented, but mentioned by Nacho Alonso Portillo in [What’s the maximum level of recursion for the hash iterator before forcing bail-out?][2]

>The value is a constant, hard coded in the product, and its value is five (5). This means that before the hash scan operator resorts to a sort based algorithm for any given subpartition that doesn’t fit into the granted memory from the workspace, five previous attempts to subdivide the original partition into smaller partitions must have happened.

The "hash scan operator" mentioned there is a reference to the internal class `CQScanHash` in `sqlmin.dll`. This class heads the implementation of the hash operator (in all its forms, including partial aggregates and flow distinct) we see in execution plans.

### Bailout algorithm

This brings us to the heart of your questions - what exactly does the bailout algorithm do? Is it "sort based" or based on "a sort of nested loops thing"?

It is arguably both, depending on your point of view. When hash recursion reaches level 5, the in-memory hash partition changes from being a hash table to an initially empty b-tree index on the hash values. Each row from a single previously-spilled hash partition is looked up in the b-tree index and inserted (new group) or updated (maintaining aggregates) as appropriate.

This series of unordered inserts to a b-tree can equally well be seen as an [insertion sort][3] or as an indexed nested loops lookup.

In any case, this fallback algorithm is guaranteed to complete eventually without allocating more memory. It may require multiple passes if the space available for the b-tree is not sufficient to hold all the grouping keys and aggregates from the overflow partition.

Once the memory available to hold the b-tree index is exhausted, any further rows (from the current spilled partition) are sent to a single new *tempdb* partition (which is guaranteed to be smaller) and the process repeats as necessary. The spill level remains at 5 because hash *recursion* has ended. Some processing details can be observed with undocumented trace flag 7357.

  [1]: https://blogs.msdn.microsoft.com/craigfr/2006/09/20/hash-aggregate/
  [2]: https://blogs.msdn.microsoft.com/ialonso/2012/09/05/whats-the-maximum-level-of-recursion-for-the-hash-iterator-before-forcing-bail-out/
  [3]: https://en.wikipedia.org/wiki/Insertion_sort

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.