Optimizer not choosing index union plan - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Optimizer not choosing index union plan

sql-server add tag

peter (imported from SE)

Why is the following query slow?

    
    select count(*) 

    from [dbo].[mt_dispatch_link] 
       , [dbo].[_mt_dispatch] [_mt_dispatch]

    where   (mt_dispatch_link.contract_id_1 = _mt_dispatch.contract_id 
         and mt_dispatch_link.dispatch_id_1 = _mt_dispatch.dispatch_id)

       or   (mt_dispatch_link.contract_id_2 = _mt_dispatch.contract_id 
         and mt_dispatch_link.dispatch_id_2 = _mt_dispatch.dispatch_id)

![enter image description here][1]


It takes over 10 minutes, and then I tend to stop it at that point. My question is more about how to make sense of the query plan.

Looking at the query plan I can see that the bottom clustered index scan is returning about 250000 records, but the cost is 0% and it is being put into a temporary table.

The top index scan is about 25000 records.

But the cost of 95% comes from the nested join. What conclusions should I draw from this?

The above query plan shows two index scans, does that mean it is doing 25000 + 250000 index scans, or does it mean that it is doing 25000 * 250000 index scans?

If I change the query to this (adding `FORCESEEK`):

    
    select count(*) 

    from [dbo].[mt_dispatch_link] 
       , [dbo].[_mt_dispatch] [_mt_dispatch] 

        WITH (FORCESEEK)

    where   (mt_dispatch_link.contract_id_1 = _mt_dispatch.contract_id 
         and mt_dispatch_link.dispatch_id_1 = _mt_dispatch.dispatch_id)

       or   (mt_dispatch_link.contract_id_2 = _mt_dispatch.contract_id 
         and mt_dispatch_link.dispatch_id_2 = _mt_dispatch.dispatch_id)

I end up with a much better plan and the query runs instantly,

![enter image description here][2]


  [1]: https://i.stack.imgur.com/rhhab.png
  [2]: https://i.stack.imgur.com/Im5zF.png

Why is the query optimiser not coming up with the optimum plan?

Top Answer

Paul White (imported from SE)

The optimizer does not always consider [index-union][1] plans (like the one shown in your second graphic) to resolve disjunctions (`OR` predicates) unless a `FORCESEEK` or `INDEX` hint is specified. This is a heuristic[^fn1] based on some practical considerations:

1. Index union is not often enough a good plan selection for general queries.
2. The number of ways indexes can be combined grows exponentially.

Using a hint changes the way the optimizer searches the space of possible plans. It disables some of the general heuristics and pursues a more goal-orientated strategy.

The optimizer's usual primary goal is to find a good plan quickly. It does not *exhaustively* search for the 'best' plan (even relatively simple queries could take years to compile if it did).

Joins with multiple conditions separated with `OR` have long been problematic. Over the years, the optimizer has added new tricks like converting them to equivalent `UNION` forms, but the transformations available are limited, so it is quite easy to come unstuck.

As far as the query plan is concerned:

1. The first row from DispatchLink causes a full scan of the Dispatch table
2. The result of the scan is stored in an internal *tempdb* worktable (the Table Spool)
3. The join checks every row from the worktable against the full `OR` predicate
4. The next row is fetched from DispatchLink and the process repeats from step 3

If there are 25,000 rows in the Dispatch Link table, the spool will be fully scanned 25,000 times. This is a disaster of course (and without index intersection, the best the optimizer can do is run the whole thing on multiple threads).

Percentage costs in query plans are only the optimizer's *estimates*. They *never* reflect actual execution costs, and are subject to the optimizer's model and will usually bear little resemblance to the 'true' cost of executing the plan on your specific hardware.

Costing numbers are there to be informative, but they should not be taken literally. The particular model the optimizer uses happens to produce pretty good plans for most queries on most systems across the world - that does not mean the model approximates anyone's reality, just that it happens to work reasonably well in practice.

Changing the design so that (Dispatch, Contract) pairs are stored in rows rather than repeated across columns will make the whole index-intersection problem go away. Relational designs with useful constraints and indexes almost always get the best out of the optimizer.

[1]: https://docs.microsoft.com/en-nz/archive/blogs/craigfr/index-union
[^fn1]: *This can be overridden with undocumented trace flag 8726.*

1 Answer