Late contradiction handling - Databases

TopAnswers Databases
Meta
Databases
TeX
Code Golf
APL
C++
.net
db<>fiddle
Java
*nix
PHP
PowerShell
Python
Rust
टेक्-मराठी
Typst
Web Client Dev
Web Server Dev
Late contradiction handling
Blog
sql-server add tag
i-one
In my [previous](https://topanswers.xyz/databases?q=1870) post, internals of the contradiction handling mechanism were touched upon. As a short reminder, detection and processing are two separate aspects of this mechanism. Detection is widely available in various compilation stages. Processing happens at simplification stage typically, but may also happen at a later (plan search) stage.

Early contradiction handling is the preferable course of events as it reduces plan search space and saves search resources for more useful work. Nonetheless, late contradiction handling might be a more interesting matter because it has some limitations in comparison to the early one.

This post demonstrates two examples of late contradiction handling, both use [AdventureWorks](https://github.com/Microsoft/sql-server-samples/releases/tag/adventureworks) sample database. Specifically, the AdventureWorks2019 on an instance of SQL Server 2019 (RTM-CU12) was used.

## **Example 1**

Consider the following query

```sql
DECLARE @list TABLE ([ID] int NOT NULL PRIMARY KEY);
DECLARE @filter TABLE ([MakeFlag] bit NULL);
INSERT INTO @list ([ID]) VALUES (1);
INSERT INTO @filter ([MakeFlag]) VALUES (0);

SELECT u.[ProductID], u.[Name]
FROM (
	    SELECT [ProductID], [Name], [MakeFlag]
	    FROM [Production].[Product]
        UNION ALL
	    SELECT 0, N'None', NULL
    ) u
    JOIN @list lst ON lst.[ID] = u.[ProductID]
    JOIN @filter f ON f.[MakeFlag] = u.[MakeFlag];
```

It contains an _implicit contradiction_. Table `@filter` is joined to the union by `f.[MakeFlag] = u.[MakeFlag]` condition, which is never True for the second part of the union due to the `NULL` value in the `MakeFlag` column. In this case, implicit contradiction cannot be handled early. To figure it out the optimizer has to split the join-to-union into the union-of-joins. Effectively, such splitting is the exploration of an alternative (implying cost estimation), and so should be performed during the plan search.

This is the execution plan generated for the query

![plan_union.png](/image?hash=70ac3bfd3c9c9bc1b2d1ffb4bc310b694e5f82a1b2768b6d6e476fce2622b793)

The Constant Scan here executes one time and returns zero rows. It is an artifact of incomplete contradiction handling. During plan search, the optimizer evaluates rewriting the subject query as

```sql
SELECT p.[ProductID], p.[Name]
FROM [Production].[Product] p
    JOIN @list lst ON lst.[ID] = p.[ProductID]
    JOIN @filter f ON f.[MakeFlag] = p.[MakeFlag]
UNION ALL
SELECT v.[ProductID], v.[Name]
FROM (VALUES (0, N'None', NULL)) v([ProductID], [Name], [MakeFlag])
    JOIN @list lst ON lst.[ID] = v.[ProductID]
    JOIN @filter f ON f.[MakeFlag] = v.[MakeFlag];
```

This rewrite makes contradiction in the second part of the union recognizable by the optimizer. The contradicting part is replaced then by an empty dataset. The next step сould be collapsing the union with the empty input, but the optimizer does not do this because it is sure that found execution plan is good enough.

If we perform the same rewrite manually, we get cleaner execution plan

![plan_union_rewrite.png](/image?hash=edb3848bb4c65fcedec6e65fe1186bc491927684b97beff5da55bd748c18fd6a)

For the rewritten query, the union is removed during simplification in fact, and what goes to the plan search is just

```sql
SELECT p.[ProductID], p.[Name]
FROM [Production].[Product] p
    JOIN @list lst ON lst.[ID] = p.[ProductID]
    JOIN @filter f ON f.[MakeFlag] = p.[MakeFlag];
```

The only sign that a union was there is the (harmless) Compute Scalar operation containing columns named `Union1010` and `Union1011` that project columns of the `Product` table

```none
[Union1010] = Scalar Operator([AdventureWorks2019].[Production].[Product].[ProductID] as [p].[ProductID])
[Union1011] = Scalar Operator([AdventureWorks2019].[Production].[Product].[Name] as [p].[Name])
```

Now see how the execution plan for the subject query changes with more rows in the `@list` table

```
INSERT INTO @list ([ID])
SELECT TOP (100) [ProductID]
FROM [Production].[Product];
```

and the `RECOMPILE` hint added (for better estimations on the table variable)

![plan_union_more_rows.png](/image?hash=abc3a02f5b0b66a3fbf81aea2ace075477ad2696645083a053918dcd53a1fdc6)

So, it's not that the optimizer cannot handle contradiction better by itself in this case --- it can, but whether it will depends on the optimization path chosen.

## **Example 2**

The following query assesses amount of sales over all orders and orders in a specific status

```sql
SELECT
    [Amount] = SUM(od.[LineTotal]),
    [StatusAmount] = SUM(IIF(oh.[Status] = 10, od.[LineTotal], NULL))
FROM [Sales].[SalesOrderDetail] od
    JOIN [Sales].[SalesOrderHeader] oh ON oh.[SalesOrderID] = od.[SalesOrderID];
```

This is the result returned

| Amount | StatusAmount |
| ------:| ------------:|
| 109846381.399888 | NULL |

And this is the execution plan

![plan_sum_join.png](/image?hash=86e88af4e98f5ee1347f19a9b8103f2a4de3ea46d7b209077258e4d036cee8b0)

Table `SalesOrderHeader` has check constraint `CK_SalesOrderHeader_Status` defined, only allowing `[Status] BETWEEN 0 AND 8`. This means that although the table is scanned entirely and its rows flow to the aggregation, none of them contribute to the result (moreover, no rows can). What we have here -- is the _hidden contradiction_. The same result could be achieved without joining this table at all.

The next query is semantically equivalent rewrite (allowed by the schema) that makes contradiction perceivable by the optimizer

```sql
SELECT
    [Amount] = SUM(od.[LineTotal]),
    [StatusAmount] = SUM(IIF(oh.[Status] = 10, od.[LineTotal], NULL))
FROM [Sales].[SalesOrderDetail] od
    LEFT JOIN [Sales].[SalesOrderHeader] oh ON oh.[SalesOrderID] = od.[SalesOrderID] AND oh.[Status] = 10;
```

This is the execution plan generated for the rewritten query

![plan_sum_left_join.png](/image?hash=103d99563fa74094efa8ef5ad7016f46243b317ca7c6c486229b5aa9184e2a7b)

Only the `SalesOrderDetail` table is accessed and no join at all, almost ideal execution plan! This, however, is not the late contradiction handling yet, because the unnecessary left join is removed early (during simplification).

---

***Side Note:*** "Almost ideal" is because the Compute Scalar supplying rows to the Stream Aggregate contains following

```none
[Expr1004] = Scalar Operator(od.[LineTotal])
[Expr1005] = Scalar Operator(CASE WHEN NULL THEN od.[LineTotal] ELSE NULL END)
```

This `CASE` expression could be folded to `NULL`, but it did not. Folding is applied extensively only at the very early compilation stages (at that moment the condition was `oh.[Status] = 10` still). At the later stages, folding is applied (if at all) very selectively.

---

Optimizer is capable of rewriting the subject query to the left join by itself actually. One can use (poorly documented) trace flag 2301 for this

```sql
SELECT
    [Amount] = SUM(od.[LineTotal]),
    [StatusAmount] = SUM(IIF(oh.[Status] = 10, od.[LineTotal], NULL))
FROM [Sales].[SalesOrderDetail] od
    JOIN [Sales].[SalesOrderHeader] oh ON oh.[SalesOrderID] = od.[SalesOrderID]
OPTION (QUERYTRACEON 2301);
```

This is the execution plan

![plan_sum_join_TF.png](/image?hash=3cceccc481aa0f6e62d4709b745643b78271953bd63c4402280991c3b8b3d687)

Again, the Constant Scan here is the result of incomplete contradiction handling, it executes one time and returns zero rows. During plan search, the optimizer evaluates rewriting query to the left join -- similarly to how we did manually little earlier (this is conducted by the `GbAggIFFPredicateUnderJoin` rule). Then it detects contradiction (left-join's `ON` predicate is False due to the constraint) and replaces right side of the left join by an empty dataset. After that, the right join is preferred over the left join for the cost reason.

---

***Side Note:*** The `GbAggIFFPredicateUnderJoin` is one of those (very few) rules that aren't considered by default and only become available under a trace flag. It explores aggregation over an inner join and comes with the following requirements and limitations:

- The join should be between tables connected by a foreign key reference, and the join predicate should match columns on the two sides of the foreign key by means of `EQ` or `IS` comparison. The foreign key column must not be nullable. Multi-column keys don't suit.
- Sensitive to the join order, though wrong order is forgivable normally by virtue of join commute explorations. For this reason, may not succeed if `FORCE ORDER` is specified.
- The expression being aggregated should be a `CASE` or `IIF` (or another conditional expression represented internally as `ScaOp_IIF`). Condition of the expression should involve a column of the primary/unique-key-side table.
- At most one variation of the conditional expression aggregation (seemingly).
- Only available in Quick Plan and Full Optimization search stages (aka *"search 1"* and *"search 2"*). So, cannot be applied if optimizer decides to enter Transaction Processing stage (*"search 0"*) and stop after.

I haven't tested this extensively, so there may be more nuances.

---

So far we have seen incomplete contradiction handling showing up as the zero-rows Constant Scan being executed one time. Depending on the execution plan shape, it can show up as being executed multiple times as well. The following query demonstrates it

```sql
SELECT
    [Count] = COUNT_BIG(*),
    [StatusCount] = COUNT_BIG(IIF(oh.[Status] = 10, 1, NULL))
FROM [Sales].[SalesOrderDetail] od
    JOIN [Sales].[SalesOrderHeader] oh ON oh.[SalesOrderID] = od.[SalesOrderID]
OPTION (QUERYTRACEON 2301);
```

This is the execution plan

![plan_count_join_TF_actual.png](/image?hash=5ea09799548eea0dcbcb78f6b0905eaa1e86c10b73fa6b9e067d7c76b63415c8)

Here the Constant Scan on the inner side of the Nested Loops returns zero rows 121317 times.

Unlike the previous example where the optimizer was able to handle contradiction completely via another optimization path, this is impossible here. Removal of the left join with the empty right side (as well as the right join with the empty left side) is not available during plan search, only during simplification.

## **Summary and Final Thoughts**

Optimizer tries to detect contradictions as early as it can and process them before the start of the cost-based optimization. It may not be possible if a contradiction is not evident enough (from the optimizer's point of view).

Contradictions that survive simplification get additional chance to be handled during plan search. Upon reaching plan search stages, there are following possibilities for a contradiction:

- remain unhandled --- the worst case. Unhandled contradictions (not taking into account hidden ones, which sometimes are hard to recognize even for a human, not just the optimizer) appear in the execution plan as branches that can never produce any rows. This may be accompanied by the missed optimization opportunities and negative performance impact (like in the [previous](https://topanswers.xyz/databases?q=1870) post, for example).
- be handled incompletely --- softer than previous. Incompletely handled contradictions appear in the execution plan as zero-rows Constant Scan operations being executed one or multiple times. Performance impact may vary from negligible to measurable, depending on the plan shape and amount of data processed.
- be fully handled --- the best of these. While the quality of the generated execution plan can be as good as if this was early handling, I think that still there is a possibility in some circumstances to get "good enough" execution plan that isn't that, because the optimizer spends search moves on contradiction processing.

There are some key distinctions between early and late handling. If a contradiction is detected early it will be processed completely, whereas detected late may not be (in some cases the optimizer doesn't care about, in others it is just limited in doing this). Late handling, however, has the advantage of detecting implicit contradictions that weren't detected early through exploration of more logical alternatives.

If you encounter attributes of an unhandled or incompletely handled contradiction in the execution plan, it can make sense to investigate deeper. Although not necessarily it is, contradiction can be a sign that something is wrong (schema, query, application logic). I'm missing (and I wish there would be) a convenient way to know about the fact of handled contradiction (and the stage at which it happened). Perhaps a warning added to the compiled plan (like for implicit type conversion or missing index) or an extended event (that is less annoying probably).