I was reviewing the SQL Server physical operators listed on TechNet (don't judge, you know you've done it) and [read that the Hash Match physical operator is sometimes used to implement the `UNION` logical operator](https://technet.microsoft.com/en-us/library/ms189582(v=sql.105).aspx).
I have never seen that done and would like to learn more. An example query would be great. When is it used and when is it better than the alternatives? (Those are usually the same, but not always.)
> An example query would be great.
Using a numbers table (integers 1...n, where n needs to be at least 1000 for this example):
SELECT N.n % 10, SPACE(100)
FROM dbo.Numbers AS N
WHERE N.n BETWEEN 1 AND 1000
SELECT 999, SPACE(100);
> When is it used and when is it better than the alternatives?
Hash union is not very common. It is preferred when one table is wide and has many duplicates, while the other is small (relatively few rows) and known to be distinct. A wide build side *with lots of duplicates* plays to the hash table's strengths because it is immediately only stored once per dupe.
### How it works
The Hash Union operator builds a hash table on the upper (build) input eliminating duplicates as it goes (like a hash aggregate performing a distinct). It then reads rows from the lower (probe) input. If there is no match in the hash table, the row is returned. When the probe input is exhausted, the operator returns each row in the hash table.
Hash Union does not add rows from the probe side to the hash table, so it cannot eliminate duplicates from that input. The optimizer either has to have a guarantee of uniqueness, or add a grouping operator to the probe side.