Does SQL Server cache the result of a multi-statement table-valued function? - Databases

TopAnswers Databases
Top Answer
Paul White (imported from SE)
The results of a multi-statement table-valued function (msTVF) are **never** cached or reused across statements (or connections), but there are a couple of ways that an msTVF result may be reused *within* the same statement. To that extent, an msTVF is not necessarily repopulated each time it is called.

### Example msTVF

This (deliberately inefficient) msTVF returns a specified range of integers, with a timestamp on each row:

    IF OBJECT_ID(N'dbo.IntegerRange', 'TF') IS NOT NULL
    	DROP FUNCTION dbo.IntegerRange;
    GO
    CREATE FUNCTION dbo.IntegerRange (@From integer, @To integer)
    RETURNS @T table 
    (
    	n integer PRIMARY KEY, 
    	ts datetime DEFAULT CURRENT_TIMESTAMP
    )
    WITH SCHEMABINDING
    AS
    BEGIN
    	WHILE @From <= @To
    	BEGIN
    		INSERT @T (n)
    		VALUES (@From);
    
    		SET @From = @From + 1;
    	END;
    	RETURN;
    END;

## Static table variable

If all the parameters to the function call are constants (or runtime constants), the execution plan will populate the table variable result once. The remainder of the plan may access the table variable many times. The static nature of the table variable can be recognised from the execution plan. For example:

    SELECT
    	IR.n,
        IR.ts 
    FROM dbo.IntegerRange(1, 5) AS IR
    ORDER BY
    	IR.n;

Returns a result similar to:

[![Simple result][1]][1]

The execution plan is:

[![Simple execution plan][2]][2]

The Sequence operator first calls the Table Valued Function operator, which populates the table variable (note this operator returns no rows). Next, the Sequence calls its second input, which returns the contents of the table variable (using a Clustered Index Scan in this case).

The giveaway that the plan is using a 'static' table variable result is the Table Valued Function operator below a Sequence - the table variable needs to be fully populated once before the remainder of the plan can get going.

### Multiple accesses

To show the table variable result being accessed more than once, we will use a second table with rows numbered from 1 to 5:

    IF OBJECT_ID(N'dbo.T', 'U') IS NOT NULL
    	DROP TABLE dbo.T;
    
    CREATE TABLE dbo.T (i integer NOT NULL);
    
    INSERT dbo.T (i) 
    VALUES (1), (2), (3), (4), (5);

And a new query that joins this table to our function (this could equally be written as an `APPLY`):

    SELECT T.i,
           IR.n,
           IR.ts
    FROM dbo.T AS T
    JOIN dbo.IntegerRange(1, 5) AS IR
    	ON IR.n = T.i;

The result is:

[![Join result][3]][3]

The execution plan:

[![Join plan][4]][4]

As before, the Sequence populates the table variable msTVF result first. Next, nested loops is used to join each row from table `T` to a row from the msTVF result. Since the function definition included a helpful index on the table variable, an index seek can be used.

The key point is that when the parameters to the msTVF are constants (including variables & parameters) or treated as runtime constants for the statement by the execution engine, the plan will feature two separate operators for the msTVF table variable result: one to populate the table; another to access the results, possibly accessing the table multiple times, and possibly making use of indexes declared in the function definition.

## Correlated parameters and non-constant parameters

To highlight the differences when correlated parameters (outer references) or non-constant function parameters are used, we will change the contents of table `T` so the function has much more work to do:

    TRUNCATE TABLE dbo.T;
    
    INSERT dbo.T (i) 
    VALUES (50001), (50002), (50003), (50004), (50005);

The following modified query now uses an outer reference to table `T` in one of the function parameters:

    SELECT T.i,
           IR.n,
           IR.ts
    FROM dbo.T AS T
    CROSS APPLY dbo.IntegerRange(1, T.i) AS IR
    WHERE IR.n = T.i;

This query takes around **8 seconds** to return results like:

[![Correlated result][5]][5]

Notice the time difference between rows in column `ts`. The `WHERE` clause limits the final result for a sensibly-sized output, but the inefficient function still takes a while to populate the table variable with 50,000-odd rows (depending on the correlated value of `i` from table `T`).

The execution plan is:

[![Correlated execution plan][6]][6]

Notice the lack of a Sequence operator. Now, there is a single Table Valued Function operator that populates the table variable and returns its rows *on each iteration* of the nested loops join.

To be clear: with just 5 rows in table T, the Table Valued Function operator runs 5 times. It generates 50,001 rows on the first iteration, 50,002 on the second...and so on. The table variable is 'thrown away' (truncated) between iterations, so each of the five calls is a full population. This is why it is so slow, and each row takes about the same time to appear in the result.

**Side notes:**

Naturally, the scenario above is deliberately contrived to show how poor performance can be when the msTVF populates many rows on each iteration.

A *sensible* implementation of the above code would set *both* msTVF parameters to `i`, and remove the redundant `WHERE` clause. The table variable would still be truncated and repopulated on each iteration, but only with one row each time.

We could also fetch the minimum and maximum `i` values from `T` and store them in variables in a prior step. Calling the function with variables instead of correlated parameters would allow the 'static' table variable pattern to be used as noted earlier.

## Caching for unchanged correlated parameters

Returning to address the original question once more, where the Sequence static pattern cannot be used, SQL Server can **avoid** truncating and repopulating the msTVF table variable if none of the correlated parameters have changed since the prior iteration of a nested loop join.

To demonstrate this, we will replace the contents of `T` with five *identical* `i` values:

    TRUNCATE TABLE dbo.T;
    
    INSERT dbo.T (i) 
    VALUES (50005), (50005), (50005), (50005), (50005);

The query with a correlated parameter again:

    SELECT T.i,
           IR.n,
           IR.ts
    FROM dbo.T AS T
    CROSS APPLY dbo.IntegerRange(1, T.i) AS IR
    WHERE IR.n = T.i;

This time the results appear in around **1.5 seconds**:

[![Identical row results][7]][7]

Note the identical timestamps on each row. The cached result in the table variable is reused for subsequent iterations where the correlated value `i` is unchanged. Reusing the result is much faster than inserting 50,005 rows each time.

The execution plan looks very similar to before:

[![Plan for identical rows][8]][8]

The key difference is in the **Actual Rebinds** and **Actual Rewinds** properties of the Table Valued Function operator:

[![Operator properties][9]][9]

When the correlated parameters do not change, SQL Server can replay (rewind) the current results in the table variable. When the correlation changes, SQL Server must truncate and repopulate the table variable (rebind). The one rebind happens on the first iteration; the four subsequent iterations are all rewinds since the value of `T.i` is unchanged.

  [1]: https://i.stack.imgur.com/WCmqK.png
  [2]: https://i.stack.imgur.com/9WomN.png
  [3]: https://i.stack.imgur.com/50KBJ.png
  [4]: https://i.stack.imgur.com/JR3cE.png
  [5]: https://i.stack.imgur.com/zdZVV.png
  [6]: https://i.stack.imgur.com/3kpgx.png
  [7]: https://i.stack.imgur.com/3MCmL.png
  [8]: https://i.stack.imgur.com/deD58.png
  [9]: https://i.stack.imgur.com/xupDQ.png
1 Answer