The `MERGE` statement has a complex syntax and an even more complex implementation, but essentially the idea is to join two tables, filter down to rows that need to be changed (inserted, updated, or deleted), and then to perform the requested changes. Given the following sample data:
```sql
DECLARE @CategoryItem AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL,
PRIMARY KEY (CategoryId, ItemId),
UNIQUE (ItemId, CategoryId)
);
DECLARE @DataSource AS TABLE
(
CategoryId integer NOT NULL,
ItemId integer NOT NULL
PRIMARY KEY (CategoryId, ItemId)
);
INSERT @CategoryItem
(CategoryId, ItemId)
VALUES
(1, 1),
(1, 2),
(1, 3),
(2, 1),
(2, 3),
(3, 5),
(3, 6),
(4, 5);
INSERT @DataSource
(CategoryId, ItemId)
VALUES
(2, 2);
```
### Target
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 2 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 3 ║
║ 3 ║ 5 ║
║ 4 ║ 5 ║
║ 3 ║ 6 ║
╚════════════╩════════╝
### Source
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝
The desired outcome is to replace data in the target with data from the source, but only for `CategoryId = 2`. Following the description of `MERGE` given above, we should write a query that joins the source and target on the keys only, and filter rows only in the `WHEN` clauses:
```sql
MERGE INTO @CategoryItem AS TARGET
USING @DataSource AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND SOURCE.CategoryId = TARGET.CategoryId
WHEN NOT MATCHED BY SOURCE
AND TARGET.CategoryId = 2
THEN DELETE
WHEN NOT MATCHED BY TARGET
AND SOURCE.CategoryId = 2
THEN INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;
```
This gives the following results:
╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 1 ║ 1 ║
║ 1 ║ 2 ║
║ 1 ║ 3 ║
║ 2 ║ 2 ║
║ 3 ║ 5 ║
║ 3 ║ 6 ║
║ 4 ║ 5 ║
╚════════════╩════════╝
The execution plan is:
![Merge plan][1]
Notice both tables are scanned fully. We might think this inefficient, because only rows where `CategoryId = 2` will be affected in the target table. This is where the warnings in Books Online come in. One misguided attempt to optimize to touch only necessary rows in the target is:
```sql
MERGE INTO @CategoryItem AS TARGET
USING
(
SELECT CategoryId, ItemId
FROM @DataSource AS ds
WHERE CategoryId = 2
) AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND TARGET.CategoryId = 2
WHEN NOT MATCHED BY TARGET THEN
INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;
```
The logic in the `ON` clause is applied as part of the join. In this case, the join is a full outer join (see [this Books Online entry][2] for why). Applying the check for category 2 on the target rows as part of an outer join ultimately results in rows with a different value being deleted (because they do not match the source):
╔═════════╦════════════╦════════╗
║ $ACTION ║ CategoryId ║ ItemId ║
╠═════════╬════════════╬════════╣
║ DELETE ║ 1 ║ 1 ║
║ DELETE ║ 1 ║ 2 ║
║ DELETE ║ 1 ║ 3 ║
║ DELETE ║ 2 ║ 1 ║
║ INSERT ║ 2 ║ 2 ║
║ DELETE ║ 2 ║ 3 ║
║ DELETE ║ 3 ║ 5 ║
║ DELETE ║ 3 ║ 6 ║
║ DELETE ║ 4 ║ 5 ║
╚═════════╩════════════╩════════╝
╔════════════╦════════╗
║ CategoryId ║ ItemId ║
╠════════════╬════════╣
║ 2 ║ 2 ║
╚════════════╩════════╝
The root cause is the same reason predicates behave differently in an outer join `ON` clause than they do if specified in the `WHERE` clause. The `MERGE` syntax (and the join implementation depending on the clauses specified) just make it harder to see that this is so.
The [guidance in Books Online][3] (expanded in the [Optimizing Performance][4] entry) offers guidance that will ensure the correct semantic is expressed using `MERGE` syntax, without the user necessarily having to understand all the implementation details, or account for the ways in which the optimizer might legitimately rearrange things for execution efficiency reasons.
The documentation offers three potential ways to implement early filtering:
* **Specifying a filtering condition in the `WHEN` clause** guarantees correct results, but may mean that more rows are read and processed from the source and target tables than is strictly necessary (as seen in the first example).
* **Updating through a view** that contains the filtering condition also guarantees correct results (since changed rows must be accessible for update through the view) but this does require a dedicated view, and one that follows the odd conditions for updating views.
* **Using a common table expression** carries similar risks to adding predicates to the `ON` clause, but for slightly different reasons. In many cases it will be safe, but it requires expert analysis of the execution plan to confirm this (and extensive practical testing). For example:
```sql
WITH TARGET AS
(
SELECT *
FROM @CategoryItem
WHERE CategoryId = 2
)
MERGE INTO TARGET
USING
(
SELECT CategoryId, ItemId
FROM @DataSource
WHERE CategoryId = 2
) AS SOURCE ON
SOURCE.ItemId = TARGET.ItemId
AND SOURCE.CategoryId = TARGET.CategoryId
WHEN NOT MATCHED BY TARGET THEN
INSERT (CategoryId, ItemId)
VALUES (CategoryId, ItemId)
WHEN NOT MATCHED BY SOURCE THEN
DELETE
OUTPUT
$ACTION,
ISNULL(INSERTED.CategoryId, DELETED.CategoryId) AS CategoryId,
ISNULL(INSERTED.ItemId, DELETED.ItemId) AS ItemId
;
```
This produces correct results (not repeated) with a more optimal plan:
![Merge plan 2][5]
The plan only reads rows for category 2 from the target table. This might be an important performance consideration if the target table is large, but it is all too easy to get this wrong using `MERGE` syntax.
Sometimes, it is easier to write the `MERGE` as separate DML operations. This approach can even **perform better** than a single `MERGE`, a fact which often surprises people.
```sql
DELETE ci
FROM @CategoryItem AS ci
WHERE ci.CategoryId = 2
AND NOT EXISTS
(
SELECT 1
FROM @DataSource AS ds
WHERE
ds.ItemId = ci.ItemId
AND ds.CategoryId = ci.CategoryId
);
INSERT @CategoryItem
SELECT
ds.CategoryId,
ds.ItemId
FROM @DataSource AS ds
WHERE
ds.CategoryId = 2
;
```
[1]: https://i.stack.imgur.com/foVhU.jpg
[2]: https://technet.microsoft.com/en-us/library/bb522522%28v=sql.105%29.aspx
[3]: https://technet.microsoft.com/en-us/library/bb510625%28v=sql.105%29.aspx
[4]: https://technet.microsoft.com/en-us/library/cc879317%28v=sql.105%29.aspx
[5]: https://i.stack.imgur.com/taZmk.jpg