sql-server add tag
Paul White (imported from SE)
Given the following table, unique clustered index, and statistics:

    CREATE TABLE dbo.Banana
    (
        pk integer NOT NULL, 
        c1 char(1) NOT NULL, 
        c2 char(1) NOT NULL
    );
    
    CREATE UNIQUE CLUSTERED INDEX pk ON dbo.Banana (pk);
    
    CREATE STATISTICS c1 ON dbo.Banana (c1);
    CREATE STATISTICS c2 ON dbo.Banana (c2);
    
    INSERT dbo.Banana 
        (pk, c1, c2) 
    VALUES 
        (1, 'A', 'W'), 
        (2, 'B', 'X'), 
        (3, 'C', 'Y'), 
        (4, 'D', 'Z');
    
    -- Populate statistics
    UPDATE STATISTICS dbo.Banana;

[![Example data][1]][1]

The statistics row modification counters obviously show zero before any updates:

    -- Show statistics modification counters
    SELECT
        stats_name = S.[name], 
        DDSP.stats_id,
        DDSP.[rows],
        DDSP.modification_counter
    FROM sys.stats AS S
    CROSS APPLY sys.dm_db_stats_properties(S.object_id, S.stats_id) AS DDSP
    WHERE
        S.[object_id] = OBJECT_ID(N'dbo.Banana', N'U');

[![Zero modification counters][2]][2]

Incrementing each `pk` column value by one for every row:

    -- Increment pk in every row
    UPDATE dbo.Banana 
    SET pk += 1;

Uses the execution plan:

[![Split Sort Collapse execution plan][3]][3]

It produces the following statistics modification counters:

[![Post-update modification counters][4]][4]

### Questions

1. What do the Split, Sort, and Collapse operators do?
2. Why do the `pk` stats show 2 modifications, but `c1` and `c2` show 5?


  [1]: https://i.stack.imgur.com/Cf0mu.png
  [2]: https://i.stack.imgur.com/jWStU.png
  [3]: https://i.stack.imgur.com/npF0U.png
  [4]: https://i.stack.imgur.com/4vIJB.png
Top Answer
Paul White (imported from SE)
SQL Server always uses the Split, Sort, and Collapse combination of operators when [maintaining a unique index][1] as part of an update that affects (or might affect) more than one row.

Working through the example in the question, we could write the update as a separate single-row update for each of the four rows present:

    -- Per row updates
    UPDATE dbo.Banana SET pk = 2 WHERE pk = 1;
    UPDATE dbo.Banana SET pk = 3 WHERE pk = 2;
    UPDATE dbo.Banana SET pk = 4 WHERE pk = 3;
    UPDATE dbo.Banana SET pk = 5 WHERE pk = 4;

The problem is that the first statement would fail, since it changes `pk` from 1 to 2, and there is already a row where `pk` = 2. The SQL Server storage engine requires that unique indexes remain unique at every stage of processing, even within a single statement. This is the problem solved by Split, Sort, and Collapse.

### Split [![Split][2]][2]

The first step is to Split each update statement into a delete followed by an insert:

    DELETE dbo.Banana WHERE pk = 1;
    INSERT dbo.Banana (pk, c1, c2) VALUES (2, 'A', 'W');
    
    DELETE dbo.Banana WHERE pk = 2;
    INSERT dbo.Banana (pk, c1, c2) VALUES (3, 'B', 'X');
    
    DELETE dbo.Banana WHERE pk = 3;
    INSERT dbo.Banana (pk, c1, c2) VALUES (4, 'C', 'Y');
    
    DELETE dbo.Banana WHERE pk = 4;
    INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');

The Split operator adds an action code column to the stream (here labelled Act1007):

[![Split properties][3]][3]

The action code is 1 for an update, 3 for a delete, and 4 for an insert. 

### Sort [![Sort][4]][4]

The split statements above would still produce a false transient unique key violation, so the next step is to sort the statements by the keys of the unique index being updated (`pk` in this case), then by the action code. For this example, this simply means that deletes (3) on the same key are ordered before inserts (4). The resulting order is:

    -- Sort (pk, action)
    DELETE dbo.Banana WHERE pk = 1;
    DELETE dbo.Banana WHERE pk = 2;
    INSERT dbo.Banana (pk, c1, c2) VALUES (2, 'A', 'W');
    DELETE dbo.Banana WHERE pk = 3;
    INSERT dbo.Banana (pk, c1, c2) VALUES (3, 'B', 'X');
    DELETE dbo.Banana WHERE pk = 4;
    INSERT dbo.Banana (pk, c1, c2) VALUES (4, 'C', 'Y');
    INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');

[![Sort properties][5]][5]

### Collapse [![Collapse][6]][6]

The preceding stage is enough to guarantee the avoidance of false uniqueness violations in all cases. As an optimization, Collapse combines *adjacent* deletes and inserts *on the same key value* into an update:

    -- Collapse (pk)
    DELETE dbo.Banana WHERE pk = 1;
    UPDATE dbo.Banana SET c1 = 'A', c2 = 'W' WHERE pk = 2;
    UPDATE dbo.Banana SET c1 = 'B', c2 = 'X' WHERE pk = 3;
    UPDATE dbo.Banana SET c1 = 'C', c2 = 'Y' WHERE pk = 4;
    INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');

The delete/insert pairs for `pk` values 2, 3, and 4 have been combined into an update, leaving a single delete on `pk` = 1, and an insert for `pk` = 5.

The Collapse operator groups rows by the key columns, and updates the action code to reflect the collapse outcome:

[![Collapse properties][7]][7]

### Clustered Index Update [![Clustered index update][8]][8]

This operator is labelled as an Update, but it is capable of inserts, updates, and deletes. Which action is taken by the Clustered Index Update per row is determined by the value of the action code in that row. The operator has an Action property to reflect this mode of operation:

[![Clustered Index Update action property][9]][9]

---

## Row modification counters

Note that the three updates above **do not modify the key(s)** of the unique index being maintained. In effect, we have transformed updates to the *key* columns in the index into updates of the *non-key* columns (`c1` and `c2`), plus a delete and an insert. Neither a delete nor an insert can cause a false unique-key violation.

An insert or a delete affects every single column in the row, so statistics associated with every column will have their modification counters incremented. For update(s), only statistics with any of the updated columns as the *leading column* have their modification counters incremented (even if the value is unchanged).

The statistics row modification counters therefore show 2 changes to `pk`, and 5 for `c1` and `c2`:

    -- Collapse (pk)
    DELETE dbo.Banana WHERE pk = 1;                         -- All columns modified
    UPDATE dbo.Banana SET c1 = 'A', c2 = 'W' WHERE pk = 2;  -- c1 and c2 modified
    UPDATE dbo.Banana SET c1 = 'B', c2 = 'X' WHERE pk = 3;  -- c1 and c2 modified
    UPDATE dbo.Banana SET c1 = 'C', c2 = 'Y' WHERE pk = 4;  -- c1 and c2 modified
    INSERT dbo.Banana (pk, c1, c2) VALUES (5, 'D', 'Z');    -- All columns modified

**Note:** Only changes applied to the **base object** (heap or clustered index) affect statistics row modification counters. Non-clustered indexes are secondary structures, reflecting changes already made to the base object. They do not affect statistics row modification counters at all.

If an object has multiple unique indexes, a separate Split, Sort, Collapse combination is used to organize the updates to each. SQL Server optimizes this case for nonclustered indexes by saving the result of the Split to an Eager Table Spool, then replaying that set for each unique index (which will have its own Sort by index keys + action code, and Collapse).

### Effect on statistics updates

Automatic statistics updates (if enabled) occur when the query optimizer needs statistical information and notices that existing statistics are **out of date** (or invalid due to a schema change). Statistics are considered out of date when the number of modifications recorded exceed a threshold.

The Split/Sort/Collapse arrangement results in **different** row modifications being recorded than would be expected. This, in turn, means that a statistics update may be triggered sooner or later than would be the case otherwise.

In the example above, row modifications for the key column increase by 2 (the net change) rather than 4 (one for each table row affected), or 5 (one for each delete/update/insert produced by the Collapse).

In addition, non-key columns that were logically **not changed** by the original query accumulate row modifications, which may number as many as **double** the table rows updated (one for each delete, and one for each insert).

---

The number of changes recorded depend on the degree of overlap between old and new key column values (and so the degree to which the separate deletes and inserts can be collapsed). Resetting the table between each execution, the following queries demonstrate the effect on row modification counters with different overlaps:

    UPDATE dbo.Banana SET pk = pk + 0; -- Full overlap

[![pk = pk + 0][10]][10]

    UPDATE dbo.Banana SET pk = pk + 1;

[![pk = pk + 1][11]][11]

    UPDATE dbo.Banana SET pk = pk + 2;

[![pk = pk + 2][12]][12]

    UPDATE dbo.Banana SET pk = pk + 3;

[![pk = pk + 3][13]][13]

    UPDATE dbo.Banana SET pk = pk + 4; -- No overlap

[![pk = pk + 4][14]][14]


  [1]: https://blogs.msdn.microsoft.com/craigfr/2007/09/06/maintaining-unique-indexes/
  [2]: https://i.stack.imgur.com/mG7va.png
  [3]: https://i.stack.imgur.com/H56Nr.png
  [4]: https://i.stack.imgur.com/8bv3p.png
  [5]: https://i.stack.imgur.com/gIh83.png
  [6]: https://i.stack.imgur.com/HkPZd.png
  [7]: https://i.stack.imgur.com/vLXyG.png
  [8]: https://i.stack.imgur.com/MSNaB.png
  [9]: https://i.stack.imgur.com/WXyKa.png
  [10]: https://i.stack.imgur.com/dMpQV.png "pk = pk + 0"
  [11]: https://i.stack.imgur.com/kLzJr.png "pk = pk + 1"
  [12]: https://i.stack.imgur.com/Q1WXk.png "pk = pk + 2"
  [13]: https://i.stack.imgur.com/Y4iTk.png "pk = pk + 3"
  [14]: https://i.stack.imgur.com/iNeZB.png "pk = pk + 4"

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.