sql-server add tag
Fabian Schmied (imported from SE)
**TL;DR:** The question below boils down to: When inserting a row, is there a window of opportunity between the _generation_ of a new `Identity` value and the _locking_ of the corresponding row key in the clustered index, where an external observer could see a _newer_ `Identity` value inserted by a concurrent transaction? (In SQL Server.)

**Detailed version**

I have a SQL Server table with an `Identity` column called `CheckpointSequence`, which is the key of the table's clustered index (which also has a number of additional nonclustered indexes). Rows are _inserted_ into the table by several concurrent processes and threads (at isolation level `READ COMMITTED`, and without `IDENTITY_INSERT`). At the same time, there are processes periodically _reading_ rows from the clustered index, ordered by that `CheckpointSequence` column (also at isolation level `READ COMMITTED`, with the `READ COMMITTED SNAPSHOT` option being turned off).

I currently rely on the fact that the reading processes can never "skip" a checkpoint. My question is: **Can I rely on this property?** And if not, what could I do to make it true?

Example: When rows with identity values 1, 2, 3, 4, and 5 are inserted, the reader _must not_ see the row with value 5 prior to seeing the one with value 4. Tests show that the query, which contains an `ORDER BY CheckpointSequence` clause (and a `WHERE CheckpointSequence > -1` clause), reliably blocks whenever row 4 is to be read, but not yet committed, even if row 5 has already been committed.

I believe that at least in theory, there may be a race condition here that might cause this assumption to break. Unfortunately, documentation on `Identity` doesn't say a lot about how `Identity` works in the context of multiple concurrent transactions, it only says "Each new value is generated based on the current seed & increment." and "Each new value for a particular transaction is different from other concurrent transactions on the table." ([MSDN][1])

My reasoning is, it must work somehow like this:

1. A transaction is started (either explicitly or implicitly).
2. An identity value (X) is generated.
3. The corresponding row lock is taken on the clustered index based on the identity value (unless lock escalation kicks in, in which case the whole table is locked).
4. The row is inserted.
5. The transaction is committed (possibly quite a lot of time later), so the lock is removed again.

I think that between step 2 and 3, there is a very tiny window where

- a concurrent session could generate the next identity value (X+1) and execute all the remaining steps,
- thus allowing a reader coming exactly at that point of time to read the value X+1, missing the value of X.

Of course, the probability of this seems extremely low; but still - it could happen. Or could it?

(If you're interested in the context: This is the implementation of [NEventStore's SQL Persistence Engine][2]. NEventStore implements an append-only event store where every event gets a new, ascending checkpoint sequence number. Clients read events from the event store ordered by checkpoint in order to perform computations of all sorts. Once an event with checkpoint X has been processed, clients only consider "newer" events, i.e., events with checkpoint X+1 and above. Therefore, it is vital that events can never be skipped, as they'd never be considered again. I'm currently trying to determine if the `Identity`-based checkpoint implementation meets this requirement. These are the **exact SQL statements used**: [Schema][3], [Writer's query][4], [Reader's Query][5].)

If I'm right and the situation described above could arise, I can see only two options of dealing with them, both of which are unsatisfactory:

- When seeing a checkpoint sequence value X+1 before having seen X, dismiss X+1 and try again later. However, because `Identity` can of course produce gaps (e.g., when the transaction is rolled back), X might never come.
- So, same approach, but accept the gap after n milliseconds. However, what value of n should I assume?

Any better ideas?

  [1]: https://msdn.microsoft.com/en-us/library/ms186775.aspx
  [2]: https://github.com/NEventStore/NEventStore.Persistence.SQL
  [3]: https://gist.github.com/fschmied/76bb2493711370ca5014
  [4]: https://gist.github.com/fschmied/d3bf0b846000347ce136
  [5]: https://gist.github.com/fschmied/47f716c32cb64b852f90
Top Answer
Paul White (imported from SE)
> When inserting a row, is there a window of opportunity between the generation of a new Identity value and the locking of the corresponding row key in the clustered index, where an external observer could see a newer Identity value inserted by a concurrent transaction?


The *allocation of identity values is independent of the containing user transaction*. This is one reason that identity values are consumed even if the transaction is rolled back. The increment operation itself is protected by a latch to prevent corruption, but that is the extent of the protections.

In the specific circumstances of your implementation, the identity allocation (a call to `CMEDSeqGen::GenerateNewValue`) is made *before the user transaction for the insert is even made active* (and so before any locks are taken).

By running two inserts concurrently with a debugger attached to allow me to freeze one thread just after the identity value is incremented and allocated, I was able to reproduce a scenario where:

1. Session 1 acquires an identity value (3)
2. Session 2 acquires an identity value (4)
3. Session 2 performs its insert and commits (so row 4 is fully visible)
4. Session 1 performs its insert and commits (row 3)

After step 3, a query using *row_number* under locking read committed returned the following:


In your implementation, this would result in Checkpoint ID 3 being skipped incorrectly.

The window of misopportunity is relatively small, but it exists. To give a more realistic scenario than having a debugger attached: An executing query thread can yield the scheduler after step 1 above. This allows a second thread to allocate an identity value, insert and commit, before the original thread resumes to perform its insert.

For clarity, there are no locks or other synchronization objects protecting the identity value after it is allocated and before it is used. For example, after step 1 above, a concurrent transaction can see the new identity value using T-SQL functions like `IDENT_CURRENT` before the row exists in the table (even uncommitted).

Fundamentally, there are no more guarantees around identity values than [documented][2]:

* Each new value is generated based on the current seed & increment.
* Each new value for a particular transaction is different from other concurrent transactions on the table.

That really is it.

If *strict* transactional FIFO processing is required, you likely have no choice but to serialize manually. If the application has less oneous requirements, you have more options. The question isn't 100% clear in that regard. Nevertheless, you may find some useful information in Remus Rusanu's article [Using Tables as Queues][3].

  [1]: https://i.stack.imgur.com/cOyjz.png
  [2]: https://msdn.microsoft.com/en-us/library/ms186775.aspx
  [3]: http://rusanu.com/2010/03/26/using-tables-as-queues/

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.