I know its something that should be avoided for performance reasons, but am trying to show a condition where it appears as a demo on how to make sure it does not appear.
However, I end up with a missing index warning, yet the optimizer chooses not to create a temporary index.
The query I am using is
SELECT
z.a
FROM dbo.t5 AS z WITH(INDEX(0))
WHERE
EXISTS
(
SELECT y.a
FROM dbo.t4 AS y
WHERE y.a = z.a
)
OPTION (MAXDOP 1);
Table schemas are:
CREATE TABLE dbo.t4
(
a integer NULL,
b varchar(1000) NULL,
p varchar(100) NULL
);
CREATE TABLE dbo.t5
(
a integer NULL,
b varchar(1000) NULL
);
CREATE UNIQUE CLUSTERED INDEX c1
ON dbo.t5 (a);
Both tables have 10,000 rows, which you can simulate with:
UPDATE STATISTICS dbo.t4
WITH
ROWCOUNT = 10000,
PAGECOUNT = 1000;
UPDATE STATISTICS dbo.t5
WITH
ROWCOUNT = 10000,
PAGECOUNT = 1000;
The query plan is:
![default plan][1]
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.2" Build="11.0.2218.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="1" StatementEstRows="5532.16" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.407384" StatementText="select a from t5 z WITH(INDEX(0)) where exists (select a from t4 where a=z.a )" StatementType="SELECT" QueryHash="0x1B882FCEA34AEAF4" QueryPlanHash="0x1B276DC04B718F7C" RetrievedFromCache="true">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="1" MemoryGrant="2912" CachedPlanSize="32" CompileTime="10" CompileCPU="10" CompileMemory="296">
<MissingIndexes>
<MissingIndexGroup Impact="82.4536">
<MissingIndex Database="[planoper]" Schema="[dbo]" Table="[t4]">
<ColumnGroup Usage="EQUALITY">
<Column Name="[a]" ColumnId="1" />
</ColumnGroup>
</MissingIndex>
</MissingIndexGroup>
</MissingIndexes>
<MemoryGrantInfo SerialRequiredMemory="1024" SerialDesiredMemory="2912" RequiredMemory="1024" DesiredMemory="2912" RequestedMemory="2912" GrantWaitTime="0" GrantedMemory="2912" MaxUsedMemory="896" />
<OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="104846" EstimatedPagesCached="11834" EstimatedAvailableDegreeOfParallelism="2" />
<RelOp AvgRowSize="11" EstimateCPU="0.228447" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="5532.16" LogicalOp="Left Semi Join" NodeId="0" Parallel="false" PhysicalOp="Hash Match" EstimatedTotalSubtreeCost="0.407384">
<OutputList>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t5]" Alias="[z]" Column="a" />
</OutputList>
<MemoryFractions Input="1" Output="1" />
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10000" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Hash>
<DefinedValues />
<HashKeysBuild>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t5]" Alias="[z]" Column="a" />
</HashKeysBuild>
<HashKeysProbe>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t4]" Column="a" />
</HashKeysProbe>
<ProbeResidual>
<ScalarOperator ScalarString="[planoper].[dbo].[t4].[a]=[planoper].[dbo].[t5].[a] as [z].[a]">
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t4]" Column="a" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t5]" Alias="[z]" Column="a" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
</ProbeResidual>
<RelOp AvgRowSize="11" EstimateCPU="0.0110785" EstimateIO="0.0565368" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="10000" LogicalOp="Clustered Index Scan" NodeId="1" Parallel="false" PhysicalOp="Clustered Index Scan" EstimatedTotalSubtreeCost="0.0676153" TableCardinality="10000">
<OutputList>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t5]" Alias="[z]" Column="a" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10000" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<IndexScan Ordered="false" ForcedIndex="true" ForceScan="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t5]" Alias="[z]" Column="a" />
</DefinedValue>
</DefinedValues>
<Object Database="[planoper]" Schema="[dbo]" Table="[t5]" Index="[c1]" Alias="[z]" IndexKind="Clustered" />
</IndexScan>
</RelOp>
<RelOp AvgRowSize="11" EstimateCPU="0.011157" EstimateIO="0.100162" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="10000" LogicalOp="Table Scan" NodeId="2" Parallel="false" PhysicalOp="Table Scan" EstimatedTotalSubtreeCost="0.111319" TableCardinality="10000">
<OutputList>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t4]" Column="a" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="10000" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<TableScan Ordered="false" ForcedIndex="false" ForceScan="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[planoper]" Schema="[dbo]" Table="[t4]" Column="a" />
</DefinedValue>
</DefinedValues>
<Object Database="[planoper]" Schema="[dbo]" Table="[t4]" IndexKind="Heap" />
</TableScan>
</RelOp>
</Hash>
</RelOp>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
It even tells me to create this index:
USE [planoper];
GO
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[t4] ([a]);
[1]: https://i.stack.imgur.com/TahmP.png
One way to get an index spool to appear naturally is to express the requirement using slightly different syntax:
SELECT DISTINCT
z.a
FROM dbo.t5 AS z
JOIN dbo.t4 AS y ON
y.a >= z.a AND y.a <= z.a
OPTION (LOOP JOIN, MAXDOP 1, FORCE ORDER);
This produces an execution plan like:
![Join inequality plan][1]
[Rewriting the equality as a pair of equivalent inequalities][2] encourages the use of an index spool, though the spooled predicate is not exactly what you were after, the semantics are ultimately the same.
Another way is to abuse `TOP`:
```
SELECT
z.a
FROM dbo.t5 AS z
WHERE
EXISTS
(
SELECT TOP ((SELECT 100)) PERCENT y.a
FROM dbo.t4 AS y
WHERE y.a = z.a
);
```
[![top 100 percent plan][3]][3]
There is no easy way to introduce the desired index spool using the original syntax; however, that's not to say it is impossible. Since you only need this for a demo, and will ***not be using this anywhere near a production system***, I will show you another way:
SELECT
z.a
FROM dbo.t5 AS z WITH(INDEX(0))
WHERE
EXISTS
(
SELECT y.a
FROM dbo.t4 AS y
WHERE y.a = z.a
)
OPTION (MAXDOP 1, LOOP JOIN, QUERYTRACEON 9114);
The execution plan is:
![Trace flag plan][4]
The index spool predicate is as desired:
```none
Seek Keys[1]: Prefix: [dbo].[t4].a = [dbo].[t5].[a] as [z].[a]
```
You will not be able to use this plan in a `USE PLAN` hint because the optimizer would not normally consider it.
Further reading:
* [Nested Loops Joins and Performance Spools][5]
* https://dba.stackexchange.com/q/239865/1192
[1]: https://i.stack.imgur.com/hqaBT.png
[2]: https://www.sql.kiwi/2010/12/heaps-of-trouble.html
[3]: https://i.stack.imgur.com/WfZYX.png
[4]: https://i.stack.imgur.com/qY1hZ.png
[5]: https://sqlperformance.com/2019/09/sql-performance/nested-loops-joins-performance-spools