sql-server add tag
Joe Obbish (imported from SE)
I was able to reproduce a query performance issue that I would describe as unexpected. I'm looking for an answer that's focused on internals.

On my machine, the following query does a clustered index scan and takes about 6.8 seconds of CPU time:

    SELECT ID1, ID2
    FROM two_col_key_test WITH (FORCESCAN)
    WHERE ID1 NOT IN
    (
    N'1', N'2',N'3', N'4', N'5',
    N'6', N'7', N'8', N'9', N'10',
    N'11', N'12',N'13', N'14', N'15',
    N'16', N'17', N'18', N'19', N'20'
    )
    AND (ID1 = N'FILLER TEXT' AND ID2 >= N'' OR (ID1 > N'FILLER TEXT'))
    ORDER BY ID1, ID2 OFFSET 12000000 ROWS FETCH FIRST 1 ROW ONLY
    OPTION (MAXDOP 1);

The following query does a clustered index seek (only difference is removing the `FORCESCAN` hint) but takes about 18.2 seconds of CPU time:

    SELECT ID1, ID2
    FROM two_col_key_test
    WHERE ID1 NOT IN
    (
    N'1', N'2',N'3', N'4', N'5',
    N'6', N'7', N'8', N'9', N'10',
    N'11', N'12',N'13', N'14', N'15',
    N'16', N'17', N'18', N'19', N'20'
    )
    AND (ID1 = N'FILLER TEXT' AND ID2 >= N'' OR (ID1 > N'FILLER TEXT'))
    ORDER BY ID1, ID2 OFFSET 12000000 ROWS FETCH FIRST 1 ROW ONLY
    OPTION (MAXDOP 1);

The query plans are pretty similar. For both queries there are 120000001 rows read from the clustered index:

[![query plans][1]][1]

I am on SQL Server 2017 CU 10. Here is code to create and populate the `two_col_key_test` table:

    drop table if exists dbo.two_col_key_test;
    
    CREATE TABLE dbo.two_col_key_test (
    	ID1 NVARCHAR(50) NOT NULL,
    	ID2 NVARCHAR(50) NOT NULL,
    	FILLER NVARCHAR(50),
    	PRIMARY KEY (ID1, ID2)
    );
    
    DROP TABLE IF EXISTS #t;
    
    SELECT TOP (4000) 0 ID INTO #t
    FROM master..spt_values t1
    CROSS JOIN master..spt_values t2
    OPTION (MAXDOP 1);
    
    
    INSERT INTO dbo.two_col_key_test WITH (TABLOCK)
    SELECT N'FILLER TEXT' + CASE WHEN ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) > 8000000 THEN N' 2' ELSE N'' END
    , ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    , NULL
    FROM #t t1
    CROSS JOIN #t t2;

I am hoping for an answer that does more than call stack reporting. For example, I can see that `sqlmin!TCValSSInRowExprFilter<231,0,0>::GetDataX` takes significantly more CPU cycles in the slow query compared to the fast one:

[![perview][2]][2]

Instead of stopping there, I'd like to understand what that is and why there's such a large difference between the two queries.

Why is there a large difference in CPU time for these two queries?


  [1]: https://i.stack.imgur.com/sJNM2.png
  [2]: https://i.stack.imgur.com/s99dW.png
Top Answer
Paul White (imported from SE)
> Why is there a large difference in CPU time for these two queries?

The scan plan evaluates the following pushed non-sargable (residual) predicate for each row:

    [two_col_key_test].[ID1]<>N'1' 
    AND [two_col_key_test].[ID1]<>N'10' 
    AND [two_col_key_test].[ID1]<>N'11' 
    AND [two_col_key_test].[ID1]<>N'12' 
    AND [two_col_key_test].[ID1]<>N'13' 
    AND [two_col_key_test].[ID1]<>N'14' 
    AND [two_col_key_test].[ID1]<>N'15' 
    AND [two_col_key_test].[ID1]<>N'16' 
    AND [two_col_key_test].[ID1]<>N'17' 
    AND [two_col_key_test].[ID1]<>N'18' 
    AND [two_col_key_test].[ID1]<>N'19' 
    AND [two_col_key_test].[ID1]<>N'2' 
    AND [two_col_key_test].[ID1]<>N'20' 
    AND [two_col_key_test].[ID1]<>N'3' 
    AND [two_col_key_test].[ID1]<>N'4' 
    AND [two_col_key_test].[ID1]<>N'5' 
    AND [two_col_key_test].[ID1]<>N'6' 
    AND [two_col_key_test].[ID1]<>N'7' 
    AND [two_col_key_test].[ID1]<>N'8' 
    AND [two_col_key_test].[ID1]<>N'9' 
    AND 
    (
        [two_col_key_test].[ID1]=N'FILLER TEXT' 
        AND [two_col_key_test].[ID2]>=N'' 
        OR [two_col_key_test].[ID1]>N'FILLER TEXT'
    )

[![scan residual][1]][1]

The seek plan does two seeking operations:



    Seek Keys[1]: 
        Prefix: 
        [two_col_key_test].ID1 = Scalar Operator(N'FILLER TEXT'), 
            Start: [two_col_key_test].ID2 >= Scalar Operator(N'')
    Seek Keys[1]: 
        Start: [two_col_key_test].ID1 > Scalar Operator(N'FILLER TEXT')

...to match this part of the predicate:

    (ID1 = N'FILLER TEXT' AND ID2 >= N'' OR (ID1 > N'FILLER TEXT'))

A residual predicate is applied to rows that pass the seek conditions above (all rows in your example). 

However, each inequality is replaced by two separate tests for *less than* `OR` *greater than*:

    ([two_col_key_test].[ID1]<N'1' OR [two_col_key_test].[ID1]>N'1') 
    AND ([two_col_key_test].[ID1]<N'10' OR [two_col_key_test].[ID1]>N'10') 
    AND ([two_col_key_test].[ID1]<N'11' OR [two_col_key_test].[ID1]>N'11') 
    AND ([two_col_key_test].[ID1]<N'12' OR [two_col_key_test].[ID1]>N'12') 
    AND ([two_col_key_test].[ID1]<N'13' OR [two_col_key_test].[ID1]>N'13') 
    AND ([two_col_key_test].[ID1]<N'14' OR [two_col_key_test].[ID1]>N'14') 
    AND ([two_col_key_test].[ID1]<N'15' OR [two_col_key_test].[ID1]>N'15') 
    AND ([two_col_key_test].[ID1]<N'16' OR [two_col_key_test].[ID1]>N'16') 
    AND ([two_col_key_test].[ID1]<N'17' OR [two_col_key_test].[ID1]>N'17') 
    AND ([two_col_key_test].[ID1]<N'18' OR [two_col_key_test].[ID1]>N'18') 
    AND ([two_col_key_test].[ID1]<N'19' OR [two_col_key_test].[ID1]>N'19') 
    AND ([two_col_key_test].[ID1]<N'2' OR [two_col_key_test].[ID1]>N'2') 
    AND ([two_col_key_test].[ID1]<N'20' OR [two_col_key_test].[ID1]>N'20') 
    AND ([two_col_key_test].[ID1]<N'3' OR [two_col_key_test].[ID1]>N'3') 
    AND ([two_col_key_test].[ID1]<N'4' OR [two_col_key_test].[ID1]>N'4') 
    AND ([two_col_key_test].[ID1]<N'5' OR [two_col_key_test].[ID1]>N'5') 
    AND ([two_col_key_test].[ID1]<N'6' OR [two_col_key_test].[ID1]>N'6') 
    AND ([two_col_key_test].[ID1]<N'7' OR [two_col_key_test].[ID1]>N'7') 
    AND ([two_col_key_test].[ID1]<N'8' OR [two_col_key_test].[ID1]>N'8') 
    AND ([two_col_key_test].[ID1]<N'9' OR [two_col_key_test].[ID1]>N'9')

[![seek residual][2]][2]

Rewriting each inequality e.g.:



    [ID1] <> N'1'  ->  [ID1]<N'1' OR [ID1]>N'1'

...is counterproductive here. Collation-aware string comparisons are expensive. Doubling the number of comparisons explains most of the difference in CPU time you see.

You can see this more clearly by disabling the pushing of non-sargable predicates with undocumented trace flag 9130. That will show the residual as a separate Filter, with performance information you can inspect separately:

[![scan][3]][3]

[![seek][4]][4]

This will also highlight the slight cardinality misestimate on the seek, which explains why the optimizer chose the seek over the scan in the first place (it expected the seeking portion to eliminate some rows).

While the inequality rewrite may make (possibly filtered) index matching possible (to make the best use of the seeking ability of b-tree indexes), it would be better to subsequently revert this expansion if both halves end up in the residual. You could suggest this as an improvement on the [SQL Server feedback site][5].

Note also that the original ("legacy") cardinality estimation model happens to select a scan by default for this query. 

  [1]: https://i.stack.imgur.com/Fnsv4.png
  [2]: https://i.stack.imgur.com/3EJns.png
  [3]: https://i.stack.imgur.com/8qK1l.png
  [4]: https://i.stack.imgur.com/H6nPo.png
  [5]: https://feedback.azure.com/forums/908035-sql-server

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.