Now, I am faced with the problem of the logic of cardinality estimation that is not quite clear for me in a seemingly rather simple situation.
I encountered this situation at my work, therefore, for privacy reasons, I will provide only a general description of the problem below, however, for a more detailed analysis, I simulated this problem in the AdventureWorksDW training base.
There is a query of the following form:
```
SELECT <some columns>
FROM <some dates table>
CROSS APPLY(
SELECT
<some p columns>
FROM <some table> p
WHERE p.StartDate <= Dates.d
AND p.EndDate >= Dates.d
) t
```
[![enter image description here][1]][1]
As you can see from the execution plan presented above, the cardinality estimator estimated the estimated number of rows in the Index Seek operation at 17,884,200 (corresponding to 2,980,700 per row from the outer part of the NL), which is quite close to the actual number.
Now I will modify the query and add to CROSS APPLY LEFT OUTER JOIN:
```
SELECT <some columns t>
FROM <some dates table>
CROSS APPLY(
SELECT
<some p columns>
<some columns f>
FROM <some table> p
LEFT JOIN <some table> f ON p.key = f.key
AND f.date = Dates.d
WHERE p.StartDate <= Dates.d
AND p.EndDate >= Dates.d
) t
```
This query gives the following plan:
[![enter image description here][2]][2]
Seeing the logical form of the query, it is logical to assume that the expected number of rows of the Index Seek operation will remain the same, although I understand that the route for finding the plan is different, however, it would seem that the part highlighted in red has not changed, the same predicates, etc. , but Index Seek's estimate is now 664,506 (corresponding to 110,751 per line from the external part of NL), which is a gross mistake and in the production environment can cause a serious tempdb spill data.
The above queries were executed on an instance of Sql Server 2012 (SP4) (KB4018073) - 11.0.7001.0 (x64).
To get more details and simplify the analysis, I simulated this problem in the AdventureWorksDW2017 database on an instance of SQL Server 2019 (RTM) - 15.0.2000.5 (X64), but I execute queries with the 9481 trace flag turned on to simulate a system with cardinality estimator version 70.
Below is a query with left outer join.
```
DECLARE @db DATE = '20130720'
DECLARE @de DATE = '20130802'
;WITH Dates AS(
SELECT [FullDateAlternateKey] AS d
FROM [AdventureWorksDW2017].[dbo].[DimDate]
WHERE [FullDateAlternateKey] BETWEEN @db AND @de
)
SELECT *
FROM Dates
CROSS APPLY(
SELECT
p.[ProductAlternateKey]
,f.[OrderQuantity]
FROM [AdventureWorksDW2017].[dbo].[DimProduct] p
LEFT JOIN [AdventureWorksDW2017].[dbo].[FactInternetSales] f ON f.ProductKey = p.ProductKey
AND f.[OrderDate] = Dates.d
WHERE p.StartDate <= Dates.d
AND ISNULL(p.EndDate, '99991231') >= Dates.d
) t
OPTION(QUERYTRACEON 9481 /*force legacy CE*/)
```
It is also worth noting that the following index was created on the DimProduct table:
```
CREATE NONCLUSTERED INDEX [Date_Indx] ON [dbo].[DimProduct]
(
[StartDate] ASC,
[EndDate] ASC
)
INCLUDE([ProductAlternateKey])
```
The query gives the following query plan: (1)
[![enter image description here][3]][3]
As you can see, the part of the query highlighted in red gives an estimate of 59,754 (~ 182 per row).
Now I’ll demonstrate a query plan without a left outer join. (2)
[![enter image description here][4]][4]
As you can see the part of the query highlighted in red gives a score of 97 565 (~ 297 per row), the difference is not so great however, the cardinality score for the filter (3) operator is significantly different ~ 244 per row versus ~ 54 in the query with left outer join.
(3) – Filter predicate:
```
isnull([AdventureWorksDW2017].[dbo].[DimProduct].[EndDate] as [p].[EndDate],'9999-12-31 00:00:00.000')>=[AdventureWorksDW2017].[dbo].[DimDate].[FullDateAlternateKey]
```
Trying to plunge deeper, I looked at the trees of physical operators presented above plans.
Below are the most important parts of the trace of undocumented flags 8607 and 8612.
For plan (2):
```
PhyOp_Apply lookup TBL: AdventureWorksDW2017.dbo.DimProduct
…
PhyOp_Range TBL: AdventureWorksDW2017.dbo.DimProduct(alias TBL: p)(6) ASC Bmk ( QCOL: [p].ProductKey) IsRow: COL: IsBaseRow1002 [ Card=296.839 Cost(RowGoal 0,ReW 0,ReB 327.68,Dist 328.68,Total 328.68)= 0.174387 ](Distance = 2)
ScaOp_Comp x_cmpLe
ScaOp_Identifier QCOL: [p].StartDate
ScaOp_Identifier QCOL: [AdventureWorksDW2017].[dbo].[DimDate].FullDateAlternateKey
```
For plan (1):
```
PhyOp_Apply (x_jtInner)
…
PhyOp_Range TBL: AdventureWorksDW2017.dbo.DimProduct(alias TBL: p)(6) ASC Bmk ( QCOL: [p].ProductKey) IsRow: COL: IsBaseRow1002 [ Card=181.8 Cost(RowGoal 0,ReW 0,ReB 327.68,Dist 328.68,Total 328.68)= 0.132795 ](Distance = 2)
ScaOp_Comp x_cmpLe
ScaOp_Identifier QCOL: [p].StartDate
ScaOp_Identifier QCOL: [AdventureWorksDW2017].[dbo].[DimDate].FullDateAlternateKey
```
As you can see, the optimizer selects various implementations of the Apply operator, PhyOp_Apply lookup in (2) and PhyOp_Apply (x_jtInner) in (1), but I still do not understand what I can extract from this.
I can get the same estimate as in plan (1) by rewriting the original query without left outer join as follows:
```
DECLARE @db DATE = '20130720'
DECLARE @de DATE = '20130802'
;WITH Dates AS(
SELECT [FullDateAlternateKey] AS d
FROM [AdventureWorksDW2017].[dbo].[DimDate]
WHERE [FullDateAlternateKey] BETWEEN @db AND @de
)
SELECT *
FROM Dates
CROSS APPLY(
SELECT TOP(1000000000)
p.[ProductAlternateKey]
FROM [AdventureWorksDW2017].[dbo].[DimProduct] p
WHERE p.StartDate <= Dates.d
AND ISNULL(p.EndDate, '99991231') >= Dates.d
) t
OPTION(QUERYTRACEON 9481 /*force legacy CE*/)
```
Which gives the following plan: (4)
[![enter image description here][5]][5]
[1]: https://i.stack.imgur.com/rzkdO.png
[2]: https://i.stack.imgur.com/SsOXe.png
[3]: https://i.stack.imgur.com/77TVs.png
[4]: https://i.stack.imgur.com/aAHAt.png
[5]: https://i.stack.imgur.com/HPuYI.png
As you can see, the estimation of the area highlighted in red coincides with the plan (1) and the PhyOp_Apply (x_jtInner) operator in the tree of physical operators.
Please help me answer the question, is there a way to influence such an estimation of cardinality, possibly by hints or by changing the query form, etc., and help to understand why the optimizer gives such an estimation in this case.