Where is Read Committed insufficient? - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Where is Read Committed insufficient?

add tag

little alien (imported from SE)

[Wikipedia says][1] that Read Committed is prone to non-repeatable reads. However, I can simply cache the first read result in my transaction (make a snapshot) and release the DB lock to let other transactions update the read rows. Effectively, I achieve repeated reads at the cost of RC isolation and, thus, higher performance than Repeatable Read offers. Nothing bad can happen and I can feel as safe as when using the Repeatable Read isolation, right? No, I guess that holding a read lock for the data I have read helps to prevent some undesirable scenarios in banking or booking reservations, probably. Which ones?

What I am looking for is an example where my emulation of RR fails. I have demonstrated that I can make the reads repeatable simply by caching the data item on its first read access. I want an example where my emulation is not sufficient. What is bad about releasing the lock immediately after reading under this strategy?

The Wikipedia article and the name *Repeatable Read* seem to imply to me that that is all what we need. But I suspect that *Repeatable Read*, by taking the shared lock, sacrifices performance for the sake of providing something more important than simple consistency of reads, since the latter can simply be achieved with a simple *Read Committed + cache (snapshot)* combination. I guess the reader locks persist for the whole transaction in *2PL* and we have shared/read locks as in *multiple readers/single-writer* for a greater purpose than just to provide repeatable reads within a single transaction and, consequently, `Repeatable Read` is a misnomer that hides the fact of locking, which actually sacrifices the performance for something larger than consistent reads.

[1]: https://en.wikipedia.org/wiki/Isolation_(database_systems)#Non-repeatable_reads

Top Answer

Paul White

The choice of isolation level is driven by the guarantees your application requires from the database. Full ACID isolation is only provided by serializable isolation; all other isolation levels provide a lesser degree of protection for transactions.

> Read Committed admits inconsistent results in repeated reads

The only guarantee provided by read committed isolation is that only committed data will be read (dirty reads are disallowed). Anything beyond that is implementation specific.

For example, in SQL Server, read committed *using row versioning* (RCSI) also guarantees *statement-level* repeatable reads and a *statement-level* point in time view of the database. To be clear, neither of those are guaranteed across multiple statements within the same transaction under RCSI (this is provided by Snapshot Isolation).

SQL Server's implementation of read committed using *locking* allows repeated reads *within the same statement* to encounter different values, and allows the database to return data from different points in time.

To emphasise: a single statement that touches the same row more than once may encounter different values. The multiple reads may not be apparent from inspecting the SQL alone, since the query optimizer has considerable freedom in constructing a physical execution strategy.

> I can simply cache the first read result in my transaction and let other transactions to update...

This is a valid approach, if it meets the isolation requirements of whatever it is you are doing with (or based on) the data.

> Nothing bad can happen and I can feel as safe as Repeated Read isolation, right?

Not generally, no. Repeatable read (RR) isolation provides more guarantees than read committed (RC) isolation. Specifically, RR guarantees that data *will not change for the life of the transaction once the data item has been read for the first time*. RR does not guarantee that reads will in fact be entirely repeatable because new data may appear in subsequent reads (phantoms).

Note that implementations are free to exceed the base requirements of each isolation level, so some products may in fact provide equal guarantees under RC as RR (this is not the case for SQL Server). Indeed, an implementation could make all isolation levels equal to serializable, and still be in compliance with the standard.

> No, I guess that holding a read lock for the data I read helps to prevent some undesirable scenarios...

Locking is an implementation detail. It is better to think about the *degree of isolation* your use of the data requires to behave correctly in all circumstances (when multiple concurrent data changes are possible).

> ...to prevent some undesirable scenarios in booking reservations, probably. Which ones?

The ways in which multiple concurrent database transactions can interact unhelpfully are limited only by one's imagination.

For example, using RC and caching the value read could result in a lost update, where another transaction updates based on the stored value, before your transaction does the same thing. In terms of histories for two transactions running `x:= x + 1`:

```none
transaction 1 reads x (=1)
transaction 2 reads x (=1) 
transaction 2 writes [x:=x + 1] (=2)
transaction 2 commits
transaction 1 writes [x:=x + 1] (=2)
transaction 1 commits
```

Only serializable isolation guarantees that if a transaction that can be shown to produce correct results with no concurrent activity, it will continue to do so when competing with any combination of concurrent transactions.

Note that this does not mean that concurrent serializable transactions actually execute sequentially, just that they will have the same effects as if they had (in some unspecified order).

Specifying any other isolation level involves a trade-off: potentially higher performance and/or concurrency with fewer isolation guarantees. Exactly what the trade-off is varies significantly between implementations.

Choosing the appropriate isolation level requires comparing the needs of your application with the guarantees available. In addition, you may need to consider other factors, for example whether the transaction requires a point-in-time view of the data or not.

For more information and some examples from a SQL Server point of view, please refer to my articles in [SQL Server Isolation Levels : A Series][1]

  [1]: https://sqlperformance.com/2014/07/t-sql-queries/isolation-levels

1 Answer