The following is a simplification of a performance problem encountered with the Query Store:
CREATE TABLE #tears
(
plan_id bigint NOT NULL
);
INSERT #tears (plan_id)
VALUES (1);
SELECT
T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
ON QSP.plan_id = T.plan_id;
The `plan_id` column is documented as being the primary key of [`sys.query_store_plan`][1], but the execution plan does not use [join elimination][2] as would be expected:
0. No attributes are being projected from the DMV.
0. The DMV primary key `plan_id` cannot duplicate rows from the temporary table
0. A `LEFT JOIN` is used, so no rows from `T` can be eliminated.
[Execution plan][3]
[![plan graphic][4]][4]
Why is this, and what can be done to obtain join elimination here?
[1]: https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-query-store-plan-transact-sql?view=sql-server-2017
[2]: https://blogs.msmvps.com/robfarley/2008/11/08/join-simplification-in-sql-server/
[3]: https://www.brentozar.com/pastetheplan/?id=HyObYM_w7
[4]: https://i.stack.imgur.com/VHXax.png
The documentation is a little misleading. The DMV is a non-materialized view, and does not have a primary key as such. The underlying definitions are a little complex but a simplified definition of `sys.query_store_plan` is:
CREATE VIEW sys.query_store_plan AS
SELECT
PPM.plan_id
-- various other attributes
FROM sys.plan_persist_plan_merged AS PPM
LEFT JOIN sys.syspalvalues AS P
ON P.class = 'PFT'
AND P.[value] = plan_forcing_type;
Further, `sys.plan_persist_plan_merged` is also a view, though one needs to connect via the Dedicated Administrator Connection to see its definition. Again, simplified:
CREATE VIEW sys.plan_persist_plan_merged AS
SELECT
P.plan_id as plan_id,
-- various other attributes
FROM sys.plan_persist_plan P
-- NOTE - in order to prevent potential deadlock
-- between QDS_STATEMENT_STABILITY LOCK and index locks
WITH (NOLOCK)
LEFT JOIN sys.plan_persist_plan_in_memory PM
ON P.plan_id = PM.plan_id;
The indexes on `sys.plan_persist_plan` are:
``` none
╔════════════════════════╦══════════════════════════════════════╦═════════════╗
║ index_name ║ index_description ║ index_keys ║
╠════════════════════════╬══════════════════════════════════════╬═════════════╣
║ plan_persist_plan_cidx ║ clustered, unique located on PRIMARY ║ plan_id ║
║ plan_persist_plan_idx1 ║ nonclustered located on PRIMARY ║ query_id(-) ║
╚════════════════════════╩══════════════════════════════════════╩═════════════╝
```
So `plan_id` is constrained to be unique on `sys.plan_persist_plan`.
Now, `sys.plan_persist_plan_in_memory` is a streaming table-valued function, presenting a tabular view of data only held in internal memory structures. As such, it does not have any unique constraints.
At heart, the query being executed is therefore equivalent to:
DECLARE @t1 table (plan_id integer NOT NULL);
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT
T1.plan_id
FROM @t1 AS T1
LEFT JOIN
(
SELECT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id
) AS Q1
ON Q1.plan_id = T1.plan_id;
...which does not produce join elimination:
[![no join elimination][1]][1]
Getting right to the core of the issue, the problem is the inner query:
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id;
...clearly the left join might result in rows from `@t2` being duplicated because `@t3` has no uniqueness constraint on `plan_id`. Therefore, the join cannot be eliminated:
[![inner query][2]][2]
To workaround this, we can explicitly tell the optimizer that we do not require any duplicate `plan_id` values:
```
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);
SELECT DISTINCT -- added
T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3
ON T3.plan_id = T2.plan_id;
```
The outer join to `@t3` can now be eliminated:
[![join eliminated][3]][3]
Applying that to the real query:
```
SELECT DISTINCT
T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
ON QSP.plan_id = T.plan_id;
```
Equally, we could add `GROUP BY T.plan_id` instead of the `DISTINCT`. Anyway, the optimizer can now correctly reason about the `plan_id` attribute all the way down through the nested views, and eliminate both outer joins as desired:
[![Both joins eliminated][4]][4]
Note that making `plan_id` unique in the temporary table would not be sufficient to obtain join elimination, since it would not preclude incorrect results. We must explicitly reject duplicate `plan_id` values from the final result to allow the optimizer to work its magic here.
[1]: https://i.stack.imgur.com/iVQdv.png
[2]: https://i.stack.imgur.com/19qlT.png
[3]: https://i.stack.imgur.com/EyM0S.png
[4]: https://i.stack.imgur.com/b0yx2.png