sql-server add tag
sam bishop (imported from SE)
I was reviewing the SQL Server physical operators listed on TechNet (don't judge, you know you've done it) and [read that the Hash Match physical operator is sometimes used to implement the `UNION` logical operator](https://technet.microsoft.com/en-us/library/ms189582(v=sql.105).aspx).

I have never seen that done and would like to learn more.  An example query would be great.  When is it used and when is it better than the alternatives?  (Those are usually the same, but not always.)
Top Answer
Paul White (imported from SE)
> An example query would be great.

Using a numbers table (integers 1...n, where n needs to be at least 1000 for this example):

    SELECT N.n % 10, SPACE(100) 
    FROM dbo.Numbers AS N 
    WHERE N.n BETWEEN 1 AND 1000
    UNION
    SELECT 999, SPACE(100);

Results:

``` none
999
6
4
8
3
1
0
7
5
9
2
```

[![hash union][1]][1]

> When is it used and when is it better than the alternatives?

Hash union is not very common. It is preferred when one table is wide and has many duplicates, while the other is small (relatively few rows) and known to be distinct. A wide build side *with lots of duplicates* plays to the hash table's strengths because it is immediately only stored once per dupe.

### How it works

The Hash Union operator builds a hash table on the upper (build) input eliminating duplicates as it goes (like a hash aggregate performing a distinct). It then reads rows from the lower (probe) input. If there is no match in the hash table, the row is returned. When the probe input is exhausted, the operator returns each row in the hash table.

Hash Union does not add rows from the probe side to the hash table, so it cannot eliminate duplicates from that input. The optimizer either has to have a guarantee of uniqueness, or add a grouping operator to the probe side.

  [1]: https://i.stack.imgur.com/9tWBw.png

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.