sam bishop imported from SE
I was reviewing the SQL Server physical operators listed on TechNet (don't judge, you know you've done it) and [read that the Hash Match physical operator is sometimes used to implement the `UNION` logical operator](https://technet.microsoft.com/en-us/library/ms189582(v=sql.105).aspx).

I have never seen that done and would like to learn more.  An example query would be great.  When is it used and when is it better than the alternatives?  (Those are usually the same, but not always.)
Top Answer
Paul White
> An example query would be great.

Using a numbers table (integers 1...n, where n needs to be at least 1000 for this example):

    SELECT N.n % 10, SPACE(100) 
    FROM dbo.Numbers AS N 
    WHERE N.n BETWEEN 1 AND 1000
    SELECT 999, SPACE(100);


``` none

[![hash union][1]][1]

> When is it used and when is it better than the alternatives?

Hash union is not very common. It is preferred when one table is wide and has many duplicates, while the other is small (relatively few rows) and known to be distinct. A wide build side *with lots of duplicates* plays to the hash table's strengths because it is immediately only stored once per dupe.

### How it works

The Hash Union operator builds a hash table on the upper (build) input eliminating duplicates as it goes (like a hash aggregate performing a distinct). It then reads rows from the lower (probe) input. If there is no match in the hash table, the row is returned. When the probe input is exhausted, the operator returns each row in the hash table.

Hash Union does not add rows from the probe side to the hash table, so it cannot eliminate duplicates from that input. The optimizer either has to have a guarantee of uniqueness, or add a grouping operator to the probe side.

  [1]: https://i.stack.imgur.com/9tWBw.png
Union implemented with Hash Match operator

This is a dedicated room for discussion about this question.

Once logged in you can direct comments to the question poster (or any answer poster) here.