Are duplicates in Read Commited standard behaviour?

sql-server add tag

7 years ago kan (imported from SE)

This refers to this question: https://dba.stackexchange.com/questions/185897/duplicate-records-returned-from-table-with-no-duplicates
The “read commited” isolation level assumes that one transaction could read same record differently, but it does not explicitly imply that records could be duplicated. I agree that two statements in the same transactions could return different data sets, but returning same data set with record duplication sounds insane.
As far as I know only MS SQL Server behaves this way.
Please prove or disprove if it conforms to “Read Committed” isolation level defined by ANSI SQL standard.

Top Answer

7 years ago Paul White (imported from SE)

The SQL Standard isolation level definitions are (probably deliberately) imprecise. They are given in terms of allowable concurrency phenomena:

P1 - dirty read
P2 - non-repeatable read
P3 - phantom

At READ COMMITTED isolation, only P1 is disallowed.

extract from ISO/IEC 9075-2

Reading the “same record” more than once due to movement within a physical structure is therefore not disallowed, so long as each read is not a dirty read (i.e. the read is performed on committed data).

Therefore, SQL Server’s locking implementation of READ COMMITTED isolation conforms with the standard.

I understand the behaviour may be surprising, but the root of that surprise is in misunderstanding what this isolation level does, and does not, guarantee. For example, one might expect a concurrent UPDATE never to change the number of records returned by a SELECT query. This is a reasonable expectation, but not one that must be met to conform with the standard.

If you want a point-in-time (stable) view of the data, locking read committed is not the isolation level implementation you need. While not mandated by the standard, read committed using row versioning (RCSI) happens to provide a statement-level point-in-time consistent view of the committed state of the database. For a transaction-level point-in-time consistent view of the committed state of the database, you would need snapshot isolation (SI).

As far as I know only MS SQL Server behaves this way.

It would be possible in any database that implements read committed using locks (specifically short-term locks for reads) and where a concurrent modification might cause a record to move ahead of the current scan position. In other words, it is implementation-specific.

Related reading:

A Critique of ANSI SQL Isolation Levels by Hal Berenson, Phil Bernstein, Jim Gray, Jim Melton, Elizabeth O’Neil, and Patrick O’Neil.
The Read Committed Isolation Level by me

Answer #2

7 years ago dan guzman (imported from SE)

It is true that one can read the same row more than once during a SELECT query running under the READ COMMITTED isolation level (without the READ COMMITTED_SNAPSHOT database option) but the subsequent row is not a duplicate. The “duplicate” is actually a modified version of the row originally read. One or more of the row’s column values will be different than that of the initial read, which is how non-repeatable read is defined under the ISO/ANSI SQL standard per Paul’s quote.

This phenomenon can occur in SQL Server when an index key value of a row is updated during a SELECT query scanning that index using ordered scan. I’m not aware of a way for this to occur with an allocation ordered scan of a heap because the original row location (with forwarding pointer) remains unchanged when the physical row must be moved to a new page to accommodate a larger row length.

2 Answers