Missed optimisations for ScalarGbAggToTop with MAX(nullable_column) - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Missed optimisations for ScalarGbAggToTop with MAX(nullable_column)

sql-server add tag

Martin Smith

**Setup**

    CREATE TABLE T1(X INT NULL UNIQUE CLUSTERED);
    CREATE TABLE T2(X INT NULL UNIQUE CLUSTERED);

    INSERT INTO T1 
    OUTPUT INSERTED.X INTO T2
    SELECT TOP 100000 NULLIF(ROW_NUMBER() OVER (ORDER BY 1/0),1) 
    FROM sys.all_objects o1, 
         sys.all_objects o2;
         
**Query**

    WITH CTE AS
    (
    SELECT X FROM T1
    UNION ALL
    SELECT X FROM T2
    )
    SELECT MAX(X)
    FROM CTE
    
**Execution Plan**

![2020-05-22_18-45-05.png](/image?hash=419151a55009d80fbe912e2f89ee2dcf14d2e4eda18f6128acfcac769a22dfe1)
    
For each table this has two backwards index scans - one with an `IS NULL` predicate and the other with an `IS NOT NULL` predicate. Each scan can stop after the first row is found.

There are a few things not to like about this arrangement. 

* That the `IS NULL` and `IS NOT NULL` are implemented with a scan and predicate in the first place rather than a seek.
* Consequently if either all the rows are `NULL` or all `NOT NULL` one of the scans will read all the rows in the table.
* The `IS NULL` scan is backwards ordered despite the fact that `NULL` will appear at the beginning of the index - therefore has to plough through all the `NOT NULL` rows first. If it was ordered forward and a `NULL` appears in the data this would be found after reading one row.

In the real world scenario that spawned this question this involved very large tables with few or no `NULL`s  and lead to an extra 5 billion rows collectively being read across 4 tables ([SO Question](https://stackoverflow.com/q/61957385/73226) and [Paste the Plan](https://www.brentozar.com/pastetheplan/?id=BJvsmdSs8) link). 

My initial question was **Why is the IS NULL branch there in the first place?**. 

For a scalar `MAX` aggregate surely it might as well just ignore these rows and only concern itself with ones matching the `IS NOT NULL` condition?

(adding an explicit `IS NOT NULL` as below does give a more satisfactory plan)

    WITH CTE AS
    (
    SELECT X FROM T1
    UNION ALL
    SELECT X FROM T2
    )
    SELECT MAX(X)
    FROM CTE
    WHERE X IS NOT NULL
 
 Though I realised after some thought that it needs to read the `NULL`s in order to correctly display (or not) the `Warning: Null value is eliminated by an aggregate or other SET operation.` message.
 
Updated Question **Is this as good as it gets with ScalarGbAggToTop?**. 

* Is there any way to coerce it to use seeks instead of scans (`FORCESEEK` can't produce a plan)?
* If it must use scans can they at least be in opposing directions - so the optimal direction for each predicate?
* If this is "as good as it gets" why is this the case? Is it just the transformation happens too late that alternative access paths will never be considered?
* If so couldn't the post optimisation rewrite phase look for scans with seekable predicates on the index used by the scan (that aren't forced scans) and just convert these to seeks? Would this be a sensible request to make?

Top Answer

Paul White

You're quite right about the reason for the separate `NULL` and `NOT NULL` branches.

>* Is there any way to coerce it to use seeks instead of scans (FORCESEEK can't produce a plan)?

No. The transformation is pretty simple: It literally translates a `MIN` or `MAX` to returning the **first** row from an index **scanned** in the **order** required by the aggregate (taking into account the index key ordering).

>* If it must use scans can they at least be in opposing directions - so the optimal direction for each predicate?

Not possible today, but it could be implemented (of course). It would need to be maintained if SQL Server indexes ever allow NULLs ordered last.

>* If this is "as good as it gets" why is this the case? Is it just the transformation happens too late that alternative access paths will never be considered?

As far as I can tell the result of the transform is constrained to use a scan, never a seek.

The `IS NOT NULL` predicate can be matched (though, hilariously, the predicate remains) to a suitable **filtered** index e.g.:

```sql
CREATE INDEX fnn ON dbo.T1 (X) WHERE X IS NOT NULL;
CREATE INDEX fnn ON dbo.T2 (X) WHERE X IS NOT NULL;
```

![NOT NULL filtered index](/image?hash=624229d7f4e0d52ddb8c8da4badaf014f3f09d0bad7f6269aa93af37b1d9d99b)

The same is **not true** for the `IS NULL` predicate, sadly. The failure may be somewhat related to the long-standing issue:

@@@ question 571

Anyway, it seems the implementation did not consider this edge case with nullable values. That's not altogether surprising. It doesn't work properly in other scenarios e.g. with **partitioned indexes**.

>* If so couldn't the post optimisation rewrite phase look for scans with seekable predicates on the index used by the scan (that aren't forced scans) and just convert these to seeks? Would this be a sensible request to make?

To me, it seems better to resolve the underlying issue with the transform, rather than adding a post-optimization rewrite. The information needed to determine that a seekable index is available is likely not available after cost-based optimization. That said, it's usually better to explain the issue and let the engineers worry about how best to implement any fix.

Don't let the long list of unaddressed bugs deter you. You never know when you'll get lucky with an enhancement request, even after they've been ignored for years.

In the meantime, unexpected execution plans are a good clue to the expert tuner that a SQL rewrite may be worth pursuing.

1 Answer