Is it legal for SQL Server to fill PERSISTED columns with data that does not match the definition? - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Is it legal for SQL Server to fill PERSISTED columns with data that does not match the definition?

sql-server sql-server-2014 add tag

usr (imported from SE)

I'm following up on [this question][1] about strange values in a `PERSISTED` computed column. The answer there makes a few guesses about how this behavior came to be.

I'm asking the following: Is this not an outright bug? Are `PERSISTED` columns ever allowed to behave this way?

    DECLARE @test TABLE (
    	Col1 INT,
    	Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED) --depends on Col1
    
    INSERT INTO @test (Col1) VALUES
    	(ABS(CHECKSUM(NEWID()) % 5)),
    	(ABS(CHECKSUM(NEWID()) % 5)),
    	(ABS(CHECKSUM(NEWID()) % 5)),
    	(ABS(CHECKSUM(NEWID()) % 5)),
    	(ABS(CHECKSUM(NEWID()) % 5))
    
    SELECT * FROM @test --shows impossible data

    UPDATE @test SET Col1 = Col1*1 --"fix" the data by rewriting it

    SELECT * FROM @test --observe fixed data
    
    /*
    Col1	Contains2
	2	0
	2	0
	0	1
	4	0
	3	0

    Col1	Contains2
	2	1
	2	1
	0	0
	4	0
	3	0
    */

Note, that the data appears "impossible" because the values of the computed column do not correspond to its definition.

It is well known that non-deterministic functions in queries can behave strangely but here this seems to violate the contract of persisted computed columns and, therefore, should be illegal.

Inserting random numbers might be a contrived scenario but what if we were inserting `NEWID()` values or `SYSUTCDATETIME()`? I think this is a relevant issue that might practically manifest itself.

  [1]: https://stackoverflow.com/questions/38120684/inconsistent-results-with-newid-and-persistent-computed-column

Top Answer

Martin Smith (imported from SE)

This is certainly a bug. The fact that the `col1` values happened to be the result of an expression involving random numbers clearly doesn't change what the the correct value for `col2` is supposed to be. `DBCC CHECKDB` returns an error if this is run against a permanent table.

    create table test (
        Col1 INT,
        Contains2 AS CASE WHEN 2 IN (Col1) THEN 1 ELSE 0 END PERSISTED);
    
    INSERT INTO test (Col1) VALUES
        (ABS(CHECKSUM(NEWID()) % 5)),
        (ABS(CHECKSUM(NEWID()) % 5)),
        (ABS(CHECKSUM(NEWID()) % 5)),
        (ABS(CHECKSUM(NEWID()) % 5)),
        (ABS(CHECKSUM(NEWID()) % 5));
    
    DBCC CHECKDB

Gives (for my test run which had one "impossible" row)

    Msg 2537, Level 16, State 106, Line 17
    Table error: object ID 437576597, index ID 0, partition ID 72057594041008128, alloc unit ID 72057594046251008 (type In-row data), page (1:121), row 0. The record check (valid computed column) failed. The values are 2 and 0.
    DBCC results for 'test'.
    There are 5 rows in 1 pages for object "test".
    CHECKDB found 0 allocation errors and 1 consistency errors in table 'test' (object ID 437576597).

It does also report that 

> repair_allow_data_loss is the minimum repair level for the errors
> found by DBCC CHECKDB

And if taken up on the repair option unceremoniously deletes the whole row as it has no way of telling which column is corrupted.

Attaching a debugger shows that the `NEWID()` is being evaluated twice per inserted row. Once before the `CASE` expression is evaluated and once inside it.

[![enter image description here][1]][1]

A possible workaround might be to use

    INSERT INTO @test
                (Col1)
    SELECT ( ABS(CHECKSUM(NEWID()) % 5) )
    FROM   (VALUES (1),(1),(1),(1),(1)) V(X); 

Which for one reason or another avoids the issue and only evaluates the expression once per row. 

  [1]: https://i.stack.imgur.com/HHKF7.png

1 Answer