Mikael Eriksson
Doing a where `Col in (select ...)` does not produce a seek plan if all `Col` values are nulls.
Setting a value for one row and updating stats gives you a seek plan.
There are some possible workarounds or rather different behaviors. Check for `not null` seams to take care of it and also doing a join instead of `Col in (`. The behavior also changes if you have a clustered primary key but looks to be the same as for a heap where the primary key is not the clustered key.
Lets focus on one thing (The Heap) at least in the fiddle, is there some light to be shed on this issue?
<>https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=cca05b16f54e6b41491340c8712ee5c3
Top Answer
Paul White
This is an issue I see quite often when the 'new' cardinality estimator is used.
## Preamble
Your `IN` query is parsed by SQL Server as a [`SOME` comparison](https://docs.microsoft.com/en-us/sql/t-sql/language-elements/some-any-transact-sql), pretty much as if you had written:
```
declare @T table(ID int primary key);
insert into @T(ID) values(1);
select T.ID,
T.C
from dbo.T
where T.C = some (select ID from @T)
```
~~~
*** Converted Tree: ***
LogOp_Project QCOL: [dbo].[T].ID QCOL: [dbo].[T].C
LogOp_Select
LogOp_Get TBL: dbo.T
ScaOp_SomeComp 2
ScaOp_Identifier QCOL: [dbo].[T].C
LogOp_Project
LogOp_Get TBL: @T Hints( NOLOCK )
AncOp_PrjList
AncOp_PrjList
~~~
SQL Server cannot work with subqueries in a relational selection directly, so this is transformed to an *apply* (left semi type) with an equality comparison by rule `RemoveSubqInSel`:
~~~
LogOp_Apply (x_jtLeftSemi)
LogOp_Get TBL: dbo.T
LogOp_Select
LogOp_Get TBL: @T @T Hints( NOLOCK )
ScaOp_Comp x_cmpEq
ScaOp_Identifier QCOL: [dbo].[T].C
ScaOp_Identifier QCOL: .ID
~~~
So the starting point of your query is in fact an apply semi join (as desired). It all goes downhill from there.
## Cardinality Estimation
The new CE will sometimes use a [*Simple Join*](https://www.sqlshack.com/join-estimation-internals/) calculator to estimate the selectivity of a join. This approach does not use histograms, but simply assumes that all distinct values on one side of the join are found among the distinct values on the other side. It's not the greatest strategy in the world, but it is fast.
### All nulls
In your particular case (with all nulls) the calculation looks like this:
~~~
Plan for computation:
CSelCalcSimpleJoinWithDistinctCounts (Using base cardinality)
CDVCPlanLeaf
0 Multi-Column Stats, 1 Single-Column Stats, 0 Guesses
CDVCPlanUniqueKey
Loaded histogram for column QCOL: [dbo].[T].C from stats with id 3
Using ambient cardinality 10000 to combine distinct counts:
1
Combined distinct count: 1
Selectivity: 1
Stats collection generated:
CStCollJoin(ID=3, CARD=10000 x_jtLeftSemi)
CStCollBaseTable(ID=1, CARD=10000 TBL: dbo.T)
CStCollBlackBox(ID=2, CARD=1)
End selectivity computation
~~~
In short, it sees one distinct value in the statistics for the real table (null), and one row (with an unknown value) in the table variable.
Somewhat optimistically, the *Simple Join* algorithm assumes these distinct values match, hence **100% selectivity**. On that basis, a *Merge Join* has a lower implementation cost than *Nested Loops Apply*.
### Nulls and a -1
When you introduce a second distinct value (-1):
~~~
Plan for computation:
CSelCalcSimpleJoinWithDistinctCounts (Using base cardinality)
CDVCPlanLeaf
0 Multi-Column Stats, 1 Single-Column Stats, 0 Guesses
CDVCPlanUniqueKey
Loaded histogram for column QCOL: [dbo].[T].C from stats with id 3
Using ambient cardinality 10000 to combine distinct counts:
2
Combined distinct count: 2
Selectivity: 0.5
Stats collection generated:
CStCollJoin(ID=3, CARD=5000 x_jtLeftSemi)
CStCollBaseTable(ID=1, CARD=10000 TBL: dbo.T)
CStCollBlackBox(ID=2, CARD=1)
End selectivity computation
~~~
There is now one distinct value in the table variable, and two in the real table, so the selectivity estimate is **50%**. This is enough to make the apply option cheaper than merge (though still a poor estimate).
### Workarounds
1. Use the original cardinality estimator, e.g.:
USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')
2. Use undocumented model variation trace flag 9473 to allow zero selectivity.
Only the first option would be supported.
---
Related:
@@@ answer 1497