Union implemented with Hash Match operator - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Union implemented with Hash Match operator

sql-server add tag

sam bishop (imported from SE)

I was reviewing the SQL Server physical operators listed on TechNet (don't judge, you know you've done it) and [read that the Hash Match physical operator is sometimes used to implement the `UNION` logical operator](https://technet.microsoft.com/en-us/library/ms189582(v=sql.105).aspx).

I have never seen that done and would like to learn more.  An example query would be great.  When is it used and when is it better than the alternatives?  (Those are usually the same, but not always.)

Top Answer

Paul White (imported from SE)

> An example query would be great.

Using a numbers table (integers 1...n, where n needs to be at least 1000 for this example):

    SELECT N.n % 10, SPACE(100) 
    FROM dbo.Numbers AS N 
    WHERE N.n BETWEEN 1 AND 1000
    UNION
    SELECT 999, SPACE(100);

Results:

``` none
999
6
4
8
3
1
0
7
5
9
2
```

[![hash union][1]][1]

> When is it used and when is it better than the alternatives?

Hash union is not very common. It is preferred when one table is wide and has many duplicates, while the other is small (relatively few rows) and known to be distinct. A wide build side *with lots of duplicates* plays to the hash table's strengths because it is immediately only stored once per dupe.

### How it works

The Hash Union operator builds a hash table on the upper (build) input eliminating duplicates as it goes (like a hash aggregate performing a distinct). It then reads rows from the lower (probe) input. If there is no match in the hash table, the row is returned. When the probe input is exhausted, the operator returns each row in the hash table.

Hash Union does not add rows from the probe side to the hash table, so it cannot eliminate duplicates from that input. The optimizer either has to have a guarantee of uniqueness, or add a grouping operator to the probe side.

  [1]: https://i.stack.imgur.com/9tWBw.png

1 Answer