Why doesn't this recursive CTE with a parameter use an index when it does with a literal?

sql-server add tag

binki (imported from SE)

I am using a recursive CTE on a tree structure to list all the descendants of a particular node in the tree. If I write a literal node value in my `WHERE` clause, SQL Server seems to actually apply the CTE just to that value, giving a query plan with [low actual rows counts, et cetera](https://www.brentozar.com/pastetheplan/?id=HJ0ImE5Hr):

[![query plan with literal value][1]][1]

 However, if I pass the value as a parameter, it seems to [realize (spool) the CTE and then filter it after the fact](https://www.brentozar.com/pastetheplan/?id=B1h7m4qBr):

[![query plan with parameter value][2]][2]

I could be reading the plans wrong. I haven’t noticed a performance issue, but I am worried that realization of the CTE could cause issues with larger data sets, especially in a busier system. Also, I normally compound this traversal on itself: I traverse up to ancestors and back down to descendants (to ensure that I gather all related nodes). Due to how my data is, each set of “related” nodes is rather small, so realization of the CTE doesn’t make sense. And when SQL Server seems to realize the CTE, it is giving me some quite large numbers in its “actual” counts.

Is there a way to get the parameterized version of the query to act like the literal version? I want to put the CTE in a reusable view.

Query with literal:

```sql
CREATE PROCEDURE #c AS BEGIN;
	WITH descendants AS (SELECT
		 t.ParentId Id
		,t.Id DescendantId
	FROM #tree t
	WHERE t.ParentId IS NOT NULL
	UNION ALL SELECT
		 d.Id
		,t.Id DescendantId
	FROM descendants d
	JOIN #tree t ON d.DescendantId = t.ParentId)
	SELECT d.*
	FROM descendants d
	WHERE d.Id = 24
	ORDER BY d.Id, d.DescendantId;
END;
GO
EXEC #c;
```

Query with parameter:

```sql
CREATE PROCEDURE #c (@Id BIGINT) AS BEGIN;
	WITH descendants AS (SELECT
		 t.ParentId Id
		,t.Id DescendantId
	FROM #tree t
	WHERE t.ParentId IS NOT NULL
	UNION ALL SELECT
		 d.Id
		,t.Id DescendantId
	FROM descendants d
	JOIN #tree t ON d.DescendantId = t.ParentId)
	SELECT d.*
	FROM descendants d
	WHERE d.Id = @Id
	ORDER BY d.Id, d.DescendantId;
END;
GO
EXEC #c 24;
```

Setup code:

```sql
DECLARE @count BIGINT = 100000;
CREATE TABLE #tree (
	 Id BIGINT NOT NULL PRIMARY KEY
	,ParentId BIGINT
);
CREATE INDEX tree_23lk4j23lk4j ON #tree (ParentId);
WITH number AS (SELECT
		 CAST(1 AS BIGINT) Value
	UNION ALL SELECT
		 n.Value * 2 + 1
	FROM number n
	WHERE n.Value * 2 + 1 <= @count
	UNION ALL SELECT
		 n.Value * 2
	FROM number n
	WHERE n.Value * 2 <= @count)
INSERT #tree (Id, ParentId)
SELECT n.Value, CASE WHEN n.Value % 3 = 0 THEN n.Value / 4 END
FROM number n;
```

  [1]: https://i.stack.imgur.com/uHkeY.png
  [2]: https://i.stack.imgur.com/04yqe.png

Top Answer

Paul White (imported from SE)

Randi Vertongen's [answer](https://topanswers.xyz/databases?q=304#a277) correctly addresses how you can get the plan you want with the parameterized version of the query. This answer supplements that by addressing the title of the question in case you are interested in the details.

SQL Server rewrites tail-recursive common table expressions (CTEs) as iteration. Everything from the *Lazy Index Spool* down is the runtime implementation of the iterative translation. I wrote a detailed account of how this section of an execution plan works in [answer][2] to [Using EXCEPT in a recursive common table expression](https://dba.stackexchange.com/q/9638/1192).

You want to specify a predicate (filter) *outside* the CTE and have the query optimizer *push this filter down* inside the recursion (rewritten as iteration) and have it applied to the anchor member. This will mean the recursion starts with only those records that match `ParentId = @Id`.

This is quite a reasonable expectation, whether a literal value, variable, or parameter is used; however, the optimizer can only do things for which rules have been written. Rules specify how a logical query tree is modified to achieve a particular transformation. They include logic to make sure the end result is safe - i.e. it returns exactly the same data as the original query specification in all possible cases.

The rule responsible for pushing predicates on a recursive CTE is called `SelOnIterator` - a relational selection (= predicate) on an iterator implementing recursion. More precisely, this rule can copy a selection down to the *anchor* part of recursive iteration:

```none
Sel(Iter(A,R)) -> Sel(Iter(Sel(A),R))
```

This rule can be disabled with the undocumented hint `OPTION(QUERYRULEOFF SelOnIterator)`. When this is used, the optimizer can no longer push predicates with a literal value down to the anchor of a recursive CTE. You don't want that, but it illustrates the point.

Originally, this rule was limited to working on predicates with literal values only. It could also be made to work with variables or parameters by specifying `OPTION (RECOMPILE)`, since that hint enables the *Parameter Embedding Optimization*, whereby the runtime literal value of the variable (or parameter) is used when compiling the plan. The plan is not cached, so the downside of this is a fresh compilation on each execution.

At some point, the `SelOnIterator` rule was improved to also work with variables and parameters. To avoid unexpected plan changes, this was protected under the 4199 trace flag, database compatibility level, and query optimizer hotfix compatibility level. This is quite a normal pattern for optimizer improvements, which are not always documented. Improvements are normally good for most people, but there is always a chance that any change will introduce a regression for someone.

> I want to put the CTE in a reusable view

You could use an inline table-valued function instead of a view. Provide the value you want to push down as a parameter, and place the predicate in the recursive anchor member.

If you prefer, enabling trace flag 4199 globally is also an option. There are many optimizer changes covered by this flag, so you would need to carefully test your workload with it enabled, and be prepared to handle regressions.

Answer #2

randi vertongen (imported from SE)

Though at the moment I don't have the title of the actual hotfix, the better query plan will be used when enabling the query optimizer hotfixes on your version (SQL Server 2012). 

Some other methods are:

 - Using `OPTION(RECOMPILE)` so the filtering happens earlier, on the
   literal value.
 - On SQL Server 2016 or higher the hotfixes before this version are
   applied automatically and the query should also run equivalent to the
   better execution plan.

----------


**Query optimizer Hotfixes**

You can enable these fixes with 

 - Traceflag 4199 before SQL Server 2016
 - `ALTER DATABASE SCOPED CONFIGURATION SET QUERY_OPTIMIZER_HOTFIXES=ON;` starting from SQL Server 2016. (not needed for your fix)

The filtering on `@id` is applied earlier to both the recursive and anchor members in the execution plan with the hotfix enabled.

The traceflag can be added at the query level:

    OPTION(QUERYTRACEON 4199)

When running the query on SQL Server 2012 SP4 GDR or SQL Server 2014 SP3  with Traceflag 4199 the better query plan is chosen:

    ALTER PROCEDURE #c (@Id BIGINT) AS BEGIN;
        WITH descendants AS (SELECT
             t.ParentId Id
            ,t.Id DescendantId
        FROM #tree t 
        WHERE t.ParentId IS NOT NULL
        UNION ALL 
    	SELECT
             d.Id
            ,t.Id DescendantId
        FROM descendants d
        JOIN #tree t ON d.DescendantId = t.ParentId)
        SELECT d.*
        FROM descendants d
    	WHERE d.Id = @Id
        ORDER BY d.Id, d.DescendantId
        OPTION( QUERYTRACEON 4199 );
         
    END;
    GO
    EXEC #c 24;

[Query Plan on SQL Server 2014 SP3 With traceflag 4199][1] 

[Query Plan on SQL Server 2012 SP4 GDR With traceflag 4199][2]

[Query Plan on SQL Server 2012 SP4 GDR without traceflag 4199][3]


The main consensus is to enable traceflag 4199 globally when using a version before SQL Server 2016. Afterwards it is open for discussion whether to enable it or not. A Q/A on that [here][4].

----------


**Compatibility level 130 or 140**

When testing the parameterized query on a database with `compatibility_level` = 130 or 140, the filtering happens earlier:

[![enter image description here][5]][5]

Due to the fact that the 'old' fixes from traceflag 4199 are enabled on SQL Server 2016 and higher.


----------


**OPTION(RECOMPILE)**

Even though a procedure is used, SQL Server will be able to filter on the literal value when adding `OPTION(RECOMPILE);`. 

    ALTER PROCEDURE #c (@Id BIGINT) AS BEGIN;
        WITH descendants AS (SELECT
             t.ParentId Id
            ,t.Id DescendantId
        FROM #tree t 
        WHERE t.ParentId IS NOT NULL
        UNION ALL 
    	SELECT
             d.Id
            ,t.Id DescendantId
        FROM descendants d
        JOIN #tree t ON d.DescendantId = t.ParentId)
        SELECT d.*
        FROM descendants d
    	WHERE d.Id = @Id
        ORDER BY d.Id, d.DescendantId
    OPTION(
    RECOMPILE )
             
    END;
    GO

[![enter image description here][6]][6]

[Query Plan on SQL Server 2012 SP4 GDR With OPTION(RECOMPILE)][7]


  [1]: https://www.brentozar.com/pastetheplan/?id=By2NJvqHB
  [2]: https://www.brentozar.com/pastetheplan/?id=BJnJePqrB
  [3]: https://www.brentozar.com/pastetheplan/?id=BJEmevcrS
  [4]: https://dba.stackexchange.com/questions/102292/trace-flag-4199-enable-globally
  [5]: https://i.stack.imgur.com/GsLZH.png
  [6]: https://i.stack.imgur.com/DqCBp.png
  [7]: https://www.brentozar.com/pastetheplan/?id=SJcEQKcrH

2 Answers