Niels Broertjes (imported from SE)
We are having dirty reads or phantom rows in our data warehouse. My question is not how to solve it (it is a long story and I am currently working on it), but I want to know what exactly happens to better understand the behaviour and use the correct terminology. Are these dirty reads or phantom rows? Let's say we have `SourceTableA` in the ERP database with a `CLUSTERED UNIQUE` index on `Col1` and `Col2`. The DWH extraction process that performs the following query: ``` SELECT Col1, Col2, Col3 FROM ERPdatabase..SourceTableA WITH(NOLOCK); ``` This is not recommended, because now we can even have dirty reads, but we are forced to do this by the ERP vendor. Long story, so never mind. Say we have `RowA` in `SourceTableA`: ``` ('GreatCompany','WonderfullItem','ColorRed') ``` Now (I think...) a user session in the ERP system performs an update on `RowA` while the previous query is running like this: ``` UPDATE SourceTableA SET Col3 = 'ColorBlue' WHERE Col1 = 'GreatCompany' AND Col2 = 'WonderfullItem' ``` If I am correct or not, what I see happening is that 2 rows are being read by the extraction process. I am not surprised since we use `NOLOCK` (although I am unsure if I understand the exact cause, hence this question), but I have 2 questions about this: 1. Since the row is not moved around in the CLUSTERED index, because the primary key does not change, why is it read a second time? My thinking would be that the extract process is already past that row, so it should not encounter it anymore. Or is that not how it works? The only explanation I can think of is that the extract query is not using the CLUSTERED index to read the data, but maybe a NONCLUSTERED index where the row IS moved around. Or what else am I missing? Or is it just a dirty read? Or is it just 'SQL Server does not guarantee row order unless you use an ORDER BY, DISTINCT or do not use NOLOCK'? Then I am curious why exactly. :-) 2. Regarding dirty reads, I always thought the behaviour I see were phantom rows, but, to my understanding, for a phantom row to occur we actually need 2 statements within the same transaction that read the same set of data. That is not the case here since it is only 1 statement, so implicitly I think that this should then be a dirty read? On the other hand, for a dirty read to occur we need an update to be rolled back by another transaction. That is not what I assume is happening. Or is that actually the only explanation?