sql-server add tag
Paul White (imported from SE)
The following is a simplification of a performance problem encountered with the Query Store:

    CREATE TABLE #tears
    (
        plan_id bigint NOT NULL
    );
    
    INSERT #tears (plan_id) 
    VALUES (1);
    
    SELECT
        T.plan_id
    FROM #tears AS T
    LEFT JOIN sys.query_store_plan AS QSP
        ON QSP.plan_id = T.plan_id;

The `plan_id` column is documented as being the primary key of [`sys.query_store_plan`][1], but the execution plan does not use [join elimination][2] as would be expected:

0. No attributes are being projected from the DMV.
0. The DMV primary key `plan_id` cannot duplicate rows from the temporary table
0. A `LEFT JOIN` is used, so no rows from `T` can be eliminated.

[Execution plan][3]

[![plan graphic][4]][4]

Why is this, and what can be done to obtain join elimination here?

  [1]: https://docs.microsoft.com/en-us/sql/relational-databases/system-catalog-views/sys-query-store-plan-transact-sql?view=sql-server-2017
  [2]: https://blogs.msmvps.com/robfarley/2008/11/08/join-simplification-in-sql-server/
  [3]: https://www.brentozar.com/pastetheplan/?id=HyObYM_w7
  [4]: https://i.stack.imgur.com/VHXax.png
Top Answer
Paul White (imported from SE)
The documentation is a little misleading. The DMV is a non-materialized view, and does not have a primary key as such. The underlying definitions are a little complex but a simplified definition of `sys.query_store_plan` is:

    CREATE VIEW sys.query_store_plan AS
    SELECT
        PPM.plan_id
        -- various other attributes
    FROM sys.plan_persist_plan_merged AS PPM
    LEFT JOIN sys.syspalvalues AS P
        ON P.class = 'PFT' 
        AND P.[value] = plan_forcing_type;

Further, `sys.plan_persist_plan_merged` is also a view, though one needs to connect via the Dedicated Administrator Connection to see its definition. Again, simplified:

    CREATE VIEW sys.plan_persist_plan_merged AS   
    SELECT     
        P.plan_id as plan_id,    
        -- various other attributes
    FROM sys.plan_persist_plan P 
        -- NOTE - in order to prevent potential deadlock
        -- between QDS_STATEMENT_STABILITY LOCK and index locks   
        WITH (NOLOCK) 
    LEFT JOIN sys.plan_persist_plan_in_memory PM
        ON P.plan_id = PM.plan_id;

The indexes on `sys.plan_persist_plan` are:

``` none
╔════════════════════════╦══════════════════════════════════════╦═════════════╗
║       index_name       ║          index_description           ║ index_keys  ║
╠════════════════════════╬══════════════════════════════════════╬═════════════╣
║ plan_persist_plan_cidx ║ clustered, unique located on PRIMARY ║ plan_id     ║
║ plan_persist_plan_idx1 ║ nonclustered located on PRIMARY      ║ query_id(-) ║
╚════════════════════════╩══════════════════════════════════════╩═════════════╝
```

So `plan_id` is constrained to be unique on `sys.plan_persist_plan`.

Now, `sys.plan_persist_plan_in_memory` is a streaming table-valued function, presenting a tabular view of data only held in internal memory structures. As such, it does not have any unique constraints.

At heart, the query being executed is therefore equivalent to:

    DECLARE @t1 table (plan_id integer NOT NULL);
    DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
    DECLARE @t3 table (plan_id integer NULL);
    
    SELECT 
        T1.plan_id
    FROM @t1 AS T1 
    LEFT JOIN
    (
        SELECT 
            T2.plan_id
        FROM @t2 AS T2
        LEFT JOIN @t3 AS T3 
            ON T3.plan_id = T2.plan_id
    ) AS Q1
        ON Q1.plan_id = T1.plan_id;

...which does not produce join elimination:

[![no join elimination][1]][1]

Getting right to the core of the issue, the problem is the inner query:

    DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
    DECLARE @t3 table (plan_id integer NULL);
    
    SELECT 
        T2.plan_id
    FROM @t2 AS T2
    LEFT JOIN @t3 AS T3 
        ON T3.plan_id = T2.plan_id;

...clearly the left join might result in rows from `@t2` being duplicated because `@t3` has no uniqueness constraint on `plan_id`. Therefore, the join cannot be eliminated:

[![inner query][2]][2]

To workaround this, we can explicitly tell the optimizer that we do not require any duplicate `plan_id` values:

```
DECLARE @t2 table (plan_id integer NOT NULL UNIQUE CLUSTERED);
DECLARE @t3 table (plan_id integer NULL);

SELECT DISTINCT -- added
    T2.plan_id
FROM @t2 AS T2
LEFT JOIN @t3 AS T3 
    ON T3.plan_id = T2.plan_id;
```

The outer join to `@t3` can now be eliminated:

[![join eliminated][3]][3]

Applying that to the real query:

```
SELECT DISTINCT
    T.plan_id
FROM #tears AS T
LEFT JOIN sys.query_store_plan AS QSP
    ON QSP.plan_id = T.plan_id;
```

Equally, we could add `GROUP BY T.plan_id` instead of the `DISTINCT`. Anyway, the optimizer can now correctly reason about the `plan_id` attribute all the way down through the nested views, and eliminate both outer joins as desired:

[![Both joins eliminated][4]][4]

Note that making `plan_id` unique in the temporary table would not be sufficient to obtain join elimination, since it would not preclude incorrect results. We must explicitly reject duplicate `plan_id` values from the final result to allow the optimizer to work its magic here.

  [1]: https://i.stack.imgur.com/iVQdv.png
  [2]: https://i.stack.imgur.com/19qlT.png
  [3]: https://i.stack.imgur.com/EyM0S.png
  [4]: https://i.stack.imgur.com/b0yx2.png

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.