Why is using a table variable more than twice as fast as a #temp table in this specific case?

sql-server add tag

Martin Smith (imported from SE)

I was looking at the article here 
[Temporary Tables vs. Table Variables and Their Effect on SQL Server Performance][1] and on SQL Server 2008 was able to reproduce similar results to those shown there for 2005. 

When executing the stored procedures (definitions below) with only 10 rows the table variable version out performs the temporary table version by more than two times.

I cleared the procedure cache and ran both stored procedures 10,000 times then repeated the process for another 4 runs. Results below (time in ms per batch)
                 
    T2_Time     V2_Time
    ----------- -----------
    8578        2718      
    6641        2781    
    6469        2813   
    6766        2797
    6156        2719

My question is: **What is the reason for the better performance of the table variable version?**

I've done some investigation. e.g. Looking at the performance counters with 

    SELECT cntr_value
    from sys.dm_os_performance_counters
    where counter_name = 'Temp Tables Creation Rate';

confirms that in both cases the temporary objects are being cached after the first run [as expected][2] rather than created from scratch again for every invocation.

Similarly tracing the `Auto Stats`, `SP:Recompile`, `SQL:StmtRecompile`events in Profiler (screenshot below) shows that these events only occur once (on the first invocation of the `#temp` table stored procedure) and the other 9,999 executions do not raise any of these events. (The table variable version does not get any of these events)

![Trace][3]

The slightly greater overhead of the first run of the stored procedure can in no way account for the big overall difference however as it still only takes a few ms to clear the procedure cache and run both procedures once so I don't believe either statistics or recompiles can be the cause. 


**Create Required Database Objects**

    CREATE DATABASE TESTDB_18Feb2012;
    
    GO
    
    USE TESTDB_18Feb2012;
    
    CREATE TABLE NUM 
      ( 
         n INT PRIMARY KEY, 
         s VARCHAR(128) 
      ); 
    
    WITH NUMS(N) 
         AS (SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY $/0) 
             FROM   master..spt_values v1, 
                    master..spt_values v2) 
    INSERT INTO NUM 
    SELECT N, 
           'Value: ' + CONVERT(VARCHAR, N) 
    FROM   NUMS 
    
    GO
    
    CREATE PROCEDURE [dbo].[T2] @total INT 
    AS 
      CREATE TABLE #T 
        ( 
           n INT PRIMARY KEY, 
           s VARCHAR(128) 
        ) 
    
      INSERT INTO #T 
      SELECT n, 
             s 
      FROM   NUM 
      WHERE  n%100 > 0 
             AND n <= @total 
    
      DECLARE @res VARCHAR(128) 
    
      SELECT @res = MAX(s) 
      FROM   NUM 
      WHERE  n <= @total 
             AND NOT EXISTS(SELECT * 
                            FROM   #T 
                            WHERE  #T.n = NUM.n) 
    GO
    
    CREATE PROCEDURE [dbo].[V2] @total INT 
    AS 
      DECLARE @V TABLE ( 
        n INT PRIMARY KEY, 
        s VARCHAR(128)) 
    
      INSERT INTO @V 
      SELECT n, 
             s 
      FROM   NUM 
      WHERE  n%100 > 0 
             AND n <= @total 
    
      DECLARE @res VARCHAR(128) 
    
      SELECT @res = MAX(s) 
      FROM   NUM 
      WHERE  n <= @total 
             AND NOT EXISTS(SELECT * 
                            FROM   @V V 
                            WHERE  V.n = NUM.n) 
                            
    
    GO
 
**Test Script**
   
    SET NOCOUNT ON;
    
    DECLARE @T1 DATETIME2,
            @T2 DATETIME2,
            @T3 DATETIME2,  
            @Counter INT = 0
    
    SET @T1 = SYSDATETIME()
            
    WHILE ( @Counter < 10000)
    BEGIN
    EXEC dbo.T2 10
    SET @Counter += 1
    END
    
    SET @T2 = SYSDATETIME()
    SET @Counter = 0
            
    WHILE ( @Counter < 10000)
    BEGIN
    EXEC dbo.V2 10
    SET @Counter += 1
    END
    
    SET @T3 = SYSDATETIME()
    
    SELECT DATEDIFF(MILLISECOND,@T1,@T2) AS T2_Time,
           DATEDIFF(MILLISECOND,@T2,@T3) AS V2_Time




  [1]: http://www.codeproject.com/Articles/18972/Temporary-Tables-vs-Table-Variables-and-Their-Effe
  [2]: http://www.sqlmag.com/article/sql-server/caching-of-temporary-objects
  [3]: https://i.stack.imgur.com/AiSQ5.png

Top Answer

Martin Smith (imported from SE)

The output of `SET STATISTICS IO ON` for both looks similar

    SET STATISTICS IO ON;
    PRINT 'V2'
    EXEC dbo.V2 10
    PRINT 'T2'
    EXEC dbo.T2 10

Gives

    V2
    Table '#58B62A60'. Scan count 0, logical reads 20
    Table 'NUM'. Scan count 1, logical reads 3
    
    Table '#58B62A60'. Scan count 10, logical reads 20
    Table 'NUM'. Scan count 1, logical reads 3
    
    T2
    Table '#T__ ... __00000000E2FE'. Scan count 0, logical reads 20
    Table 'NUM'. Scan count 1, logical reads 3
    
    Table '#T__ ... __00000000E2FE'. Scan count 0, logical reads 20
    Table 'NUM'. Scan count 1, logical reads 3

And as Aaron points out in the comments the plan for the table variable version is actually less efficient as whilst both have a nested loops plan driven by an index seek on `dbo.NUM` the `#temp` table version performs a seek into the index on `[#T].n = [dbo].[NUM].[n]` with residual predicate `[#T].[n]<=[@total]` whereas the table variable version performs an index seek on `@V.n <= [@total]` with residual predicate `@V.[n]=[dbo].[NUM].[n]` and so processes more rows (which is why this plan performs so poorly for larger number of rows)

Using [Extended Events][1] to look at the wait types for the specific spid gives these results for 10,000 executions of `EXEC dbo.T2 10`

    +---------------------+------------+----------------+----------------+----------------+
    |                     |            |     Total      | Total Resource |  Total Signal  |
    | Wait Type           | Wait Count | Wait Time (ms) | Wait Time (ms) | Wait Time (ms) |
    +---------------------+------------+----------------+----------------+----------------+
    | SOS_SCHEDULER_YIELD | 16         | 19             | 19             | 0              |
    | PAGELATCH_SH        | 39998      | 14             | 0              | 14             |
    | PAGELATCH_EX        | 1          | 0              | 0              | 0              |
    +---------------------+------------+----------------+----------------+----------------+

and these results for 10,000 executions of `EXEC dbo.V2 10`

    +---------------------+------------+----------------+----------------+----------------+
    |                     |            |     Total      | Total Resource |  Total Signal  |
    | Wait Type           | Wait Count | Wait Time (ms) | Wait Time (ms) | Wait Time (ms) |
    +---------------------+------------+----------------+----------------+----------------+
    | PAGELATCH_EX        | 2          | 0              | 0              | 0              |
    | PAGELATCH_SH        | 1          | 0              | 0              | 0              |
    | SOS_SCHEDULER_YIELD | 676        | 0              | 0              | 0              |
    +---------------------+------------+----------------+----------------+----------------+

So it is clear that the number of `PAGELATCH_SH` waits is much higher in the `#temp` table case. I'm not aware of any way of adding the wait resource to the extended events trace so to investigate this further I ran 

    WHILE 1=1
    EXEC dbo.T2 10

Whilst in another connection polling `sys.dm_os_waiting_tasks`

    CREATE TABLE #T(resource_description NVARCHAR(2048))
    
    WHILE 1=1
    INSERT INTO #T
    SELECT resource_description
    FROM sys.dm_os_waiting_tasks
    WHERE session_id=<spid_of_other_session> and wait_type='PAGELATCH_SH'

After leaving that running for about 15 seconds it had gathered the following results

    +-------+----------------------+
    | Count | resource_description |
    +-------+----------------------+
    |  1098 | 2:1:150              |
    |  1689 | 2:1:146              |
    +-------+----------------------+

Both of these pages being latched belong to (different) non clustered indexes on the `tempdb.sys.sysschobjs` base table named `'nc1'` and `'nc2'`.

Querying `tempdb.sys.fn_dblog` during the runs indicates that the number of log records added by the first execution of each stored procedure was somewhat variable but for subsequent executions the number added by each iteration was very consistent and predictable. Once the procedure plans are cached the number of log entries are about half those needed for the `#temp` version.

    +-----------------+----------------+------------+
    |                 | Table Variable | Temp Table |
    +-----------------+----------------+------------+
    | First Run       |            126 | 72 or 136  |
    | Subsequent Runs |             17 | 32         |
    +-----------------+----------------+------------+

Looking at the transaction log entries in more detail for the `#temp` table version of the SP each subsequent invocation of the stored procedure creates three transactions and the table variable one only two.

    +---------------------------------+----+---------------------------------+----+
    |           #Temp Table                |         @Table Variable              |
    +---------------------------------+----+---------------------------------+----+
    | CREATE TABLE                    |  9 |                                 |    |
    | INSERT                          | 12 | TVQuery                         | 12 |
    | FCheckAndCleanupCachedTempTable | 11 | FCheckAndCleanupCachedTempTable |  5 |
    +---------------------------------+----+---------------------------------+----+

The `INSERT`/`TVQUERY` transactions are identical except for the name. This contains the log records for each of the 10 rows inserted to the temporary table or table variable plus the `LOP_BEGIN_XACT`/ `LOP_COMMIT_XACT` entries.

The `CREATE TABLE` transaction only appears in the `#Temp` version and looks as follows.

    +-----------------+-------------------+---------------------+
    |    Operation    |      Context      |    AllocUnitName    |
    +-----------------+-------------------+---------------------+
    | LOP_BEGIN_XACT  | LCX_NULL          |                     |
    | LOP_SHRINK_NOOP | LCX_NULL          |                     |
    | LOP_MODIFY_ROW  | LCX_CLUSTERED     | sys.sysschobjs.clst |
    | LOP_DELETE_ROWS | LCX_MARK_AS_GHOST | sys.sysschobjs.nc1  |
    | LOP_INSERT_ROWS | LCX_INDEX_LEAF    | sys.sysschobjs.nc1  |
    | LOP_DELETE_ROWS | LCX_MARK_AS_GHOST | sys.sysschobjs.nc2  |
    | LOP_INSERT_ROWS | LCX_INDEX_LEAF    | sys.sysschobjs.nc2  |
    | LOP_MODIFY_ROW  | LCX_CLUSTERED     | sys.sysschobjs.clst |
    | LOP_COMMIT_XACT | LCX_NULL          |                     |
    +-----------------+-------------------+---------------------+

The `FCheckAndCleanupCachedTempTable` transaction appears in both but has 6 additional entries in the `#temp` version. These are the 6 rows referring to `sys.sysschobjs` and they have exactly the same pattern as above.

    +-----------------+-------------------+----------------------------------------------+
    |    Operation    |      Context      |                AllocUnitName                 |
    +-----------------+-------------------+----------------------------------------------+
    | LOP_BEGIN_XACT  | LCX_NULL          |                                              |
    | LOP_DELETE_ROWS | LCX_NONSYS_SPLIT  | dbo.#7240F239.PK__#T________3BD0199374293AAB |
    | LOP_HOBT_DELTA  | LCX_NULL          |                                              |
    | LOP_HOBT_DELTA  | LCX_NULL          |                                              |
    | LOP_MODIFY_ROW  | LCX_CLUSTERED     | sys.sysschobjs.clst                          |
    | LOP_DELETE_ROWS | LCX_MARK_AS_GHOST | sys.sysschobjs.nc1                           |
    | LOP_INSERT_ROWS | LCX_INDEX_LEAF    | sys.sysschobjs.nc1                           |
    | LOP_DELETE_ROWS | LCX_MARK_AS_GHOST | sys.sysschobjs.nc2                           |
    | LOP_INSERT_ROWS | LCX_INDEX_LEAF    | sys.sysschobjs.nc2                           |
    | LOP_MODIFY_ROW  | LCX_CLUSTERED     | sys.sysschobjs.clst                          |
    | LOP_COMMIT_XACT | LCX_NULL          |                                              |
    +-----------------+-------------------+----------------------------------------------+

Looking at these 6 rows in both transactions they correspond to the same operations. The first `LOP_MODIFY_ROW, LCX_CLUSTERED` is an update to the `modify_date` column in `sys.objects`. The remaining five rows are all concerned with object renaming. Because `name` is a key column of both affected NCIs (`nc1` and `nc2`) this is carried out as a delete/insert for those then it goes back to the clustered index and updates that too.

It appears that for the `#temp` table version when the stored procedure ends part of the clean up carried out by the `FCheckAndCleanupCachedTempTable` transaction is to rename the temp table from something like `#T__________________________________________________________________________________________________________________00000000E316` to a different internal name such as `#2F4A0079` and when it is entered the `CREATE TABLE` transaction renames it back. This flip flopping name can be seen by in one connection executing `dbo.T2` in a loop whilst in another 

    WHILE 1=1
    SELECT name, object_id, create_date, modify_date
    FROM tempdb.sys.objects 
    WHERE name LIKE '#%'

Example Results

![Screenshot][2]

So one potential explanation for the observed performance differential as alluded to by Alex is that it is this additional work maintaining the system tables in `tempdb` that is responsible.

-----

Running both procedures in a loop the Visual Studio Code profiler reveals the following

    +-------------------------------+--------------------+-------+-----------+
    |           Function            |    Explanation     | Temp  | Table Var |
    +-------------------------------+--------------------+-------+-----------+
    | CXStmtDML::XretExecute        | Insert ... Select  | 16.93 | 37.31     |
    | CXStmtQuery::ErsqExecuteQuery | Select Max         | 8.77  | 23.19     |
    +-------------------------------+--------------------+-------+-----------+
    | Total                         |                    | 25.7  | 60.5      |
    +-------------------------------+--------------------+-------+-----------+

The table variable version spends about 60% of the time performing the insert statement and the subsequent select whereas the temporary table is less than half that. This is inline with the timings shown in the OP and with the conclusion above that the difference in performance is down to time spent performing ancillary work not due to time spent in the query execution itself.

The most important functions contributing towards the "missing" 75% in the temporary table version are

    +------------------------------------+-------------------+
    |              Function              | Inclusive Samples |
    +------------------------------------+-------------------+
    | CXStmtCreateTableDDL::XretExecute  | 26.26%            |
    | CXStmtDDL::FinishNormalImp         | 4.17%             |
    | TmpObject::Release                 | 27.77%            |
    +------------------------------------+-------------------+
    | Total                              | 58.20%            |
    +------------------------------------+-------------------+

Under both the create and release functions the function `CMEDProxyObject::SetName` is shown with an inclusive sample value of `19.6%`. From which I infer that 39.2% of the time in the temporary table case is taken up with the renaming described earlier.

And the largest ones in the table variable version contributing to the other 40% are

    +-----------------------------------+-------------------+
    |             Function              | Inclusive Samples |
    +-----------------------------------+-------------------+
    | CTableCreate::LCreate             | 7.41%             |
    | TmpObject::Release                | 12.87%            |
    +-----------------------------------+-------------------+
    | Total                             | 20.28%            |
    +-----------------------------------+-------------------+

## Temporary Table Profile

[![enter image description here][3]][3]




## Table Variable Profile

[![enter image description here][4]][4]
 


  [1]: http://sqlskills.com/BLOGS/PAUL/post/Capturing-wait-stats-for-a-single-operation.aspx
  [2]: https://i.stack.imgur.com/yXNRJ.png
  [3]: https://i.stack.imgur.com/FE4TL.png
  [4]: https://i.stack.imgur.com/kJFDv.png

Answer #2

meme (imported from SE)

Disco Inferno
--
Since this is an older question, I decided to revisit the issue on newer versions of SQL Server to see if the same performance profile still exists, or if the characteristics have changed at all. 

Specifically, the addition of [in-memory system tables for SQL Server 2019][1] seems a worthwhile occasion to re-test.

I'm using a slightly different test harness, since I ran into this issue while working on something else.

Testing, testing
--
Using the [2013 version of Stack Overflow][2], I have this index and these two procedures:

Index:

    CREATE INDEX ix_whatever 
        ON dbo.Posts(OwnerUserId) INCLUDE(Score);
    GO

 


Temp table:
    
        CREATE OR ALTER PROCEDURE dbo.TempTableTest(@Id INT)
        AS
        BEGIN
        SET NOCOUNT ON;
            
        	CREATE TABLE #t(i INT NOT NULL);
        	DECLARE @i INT;
        
        	INSERT #t ( i )
            SELECT p.Score
        	FROM dbo.Posts AS p
        	WHERE p.OwnerUserId = @Id;
        
        	SELECT @i = AVG(t.i)
        	FROM #t AS t;
        
        END;
        GO 


Table variable:
        
        CREATE OR ALTER PROCEDURE dbo.TableVariableTest(@Id INT)
        AS
        BEGIN
        SET NOCOUNT ON;
            
        	DECLARE @t TABLE (i INT NOT NULL);
        	DECLARE @i INT;
        
        	INSERT @t ( i )
            SELECT p.Score
        	FROM dbo.Posts AS p
        	WHERE p.OwnerUserId = @Id;
        
        	SELECT @i = AVG(t.i)
        	FROM @t AS t;
        
        END;
        GO 

To prevent any potential [ASYNC_NETWORK_IO waits][3], I'm using wrapper procedures.

    CREATE PROCEDURE #TT AS
    SET NOCOUNT ON;
        DECLARE @i INT = 1;
        DECLARE @StartDate DATETIME2(7) = SYSDATETIME();
        
        WHILE @i <= 50000
            BEGIN
                EXEC dbo.TempTableTest @Id = @i;
                SET @i += 1;
            END;
        SELECT DATEDIFF(MILLISECOND, @StartDate, SYSDATETIME()) AS [ElapsedTimeMilliseconds];
    GO
       
    CREATE PROCEDURE #TV AS
    SET NOCOUNT ON;
        DECLARE @i INT = 1;
        DECLARE @StartDate DATETIME2(7) = SYSDATETIME();
        
        WHILE @i <= 50000
            BEGIN
                EXEC dbo.TableVariableTest @Id = @i;
                SET @i += 1;
            END;
        SELECT DATEDIFF(MILLISECOND, @StartDate, SYSDATETIME()) AS [ElapsedTimeMilliseconds];
    GO


SQL Server 2017
--
Since 2014 and 2016 are basically RELICS at this point, I'm starting my testing with 2017. Also, for brevity, I'm jumping right to profiling the code with [Perfview][4]. In real life, I looked at waits, latches, spinlocks, crazy trace flags, and other stuff. 

Profiling the code is the only thing that revealed anything of interest.

**Time difference:**

- Temp Table: 17891 ms
- Table Variable: 5891 ms

Still a very clear difference, eh? But what's SQL Server hitting now?

[![NUTS][5]][5]

Looking at the top two increases in the diffed samples, we see `sqlmin` and `sqlsqllang!TCacheStore<CacheClockAlgorithm>::GetNextUserDataInHashBucket` are the two biggest offenders.

[![NUTS][6]][6]

Judging by the names in the call stacks, cleaning up and internally renaming temp tables seems to be the biggest time sucks in the temp table call vs. the table variable call.

Even though table variables are internally backed by temp tables, this doesn't seem to be an issue.

    SET STATISTICS IO ON;
    DECLARE @t TABLE(id INT);
    SELECT * FROM @t AS t;

> Table '#B98CE339'. Scan count 1

Looking through the call stacks for the table variable test doesn't show either of the main offenders at all:

[![NUTS][7]][7]

SQL Server 2019 (Vanilla)
--
Alright, so this is still an issue in SQL Server 2017, is anything different in 2019 out of the box?

First, to show there's nothing up my sleeve:

    SELECT c.name,
           c.value_in_use,
           c.description
    FROM sys.configurations AS c
    WHERE c.name = 'tempdb metadata memory-optimized';

[![NUTS][8]][8]

**Time difference:**

- Temp table: 15765 ms
- Table Variable: 7250 ms

Both procedures were different. The temp table call was a couple seconds faster, and the table variable call was about 1.5 seconds slower. The table variable slow down may be partially explained by [table variable deferred compilation][9], a new optimizer choice in 2019.

Looking at the diff in Perfview, it has changed a bit -- sqlmin is no longer there -- but `sqllang!TCacheStore<CacheClockAlgorithm>::GetNextUserDataInHashBucket` is.

[![NUTS][10]][10]

SQL Server 2019 (In-Memory Tempdb system tables)
--
What about this new in memory system table thing? Hm? Sup with that?

Let's turn it on!

    EXEC sys.sp_configure @configname = 'advanced', 
                          @configvalue = 1  
    RECONFIGURE;
    
    EXEC sys.sp_configure @configname = 'tempdb metadata memory-optimized', 
                          @configvalue = 1 
    RECONFIGURE;

Note that this requires a SQL Server restart to kick in, so pardon me while I reboot SQL on this lovely Friday afternoon.

Now things look different:

    SELECT c.name,
           c.value_in_use,
           c.description
    FROM sys.configurations AS c
    WHERE c.name = 'tempdb metadata memory-optimized';
    
    SELECT *, 
           OBJECT_NAME(object_id) AS object_name, 
    	   @@VERSION AS sql_server_version
    FROM tempdb.sys.memory_optimized_tables_internal_attributes;


[![NUTS][11]][11]


**Time difference:**

 - Temp table: 11638 ms
 - Table variable: 7403 ms

The temp tables did about 4 seconds better! That's something.

I like something.

This time, the Perfview diff isn't very interesting. Side by side, it's interesting to note how close the times are across the board:

[![NUTS][12]][12]

One interesting point in the diff are the calls to `hkengine!`, which may seem obvious since hekaton-ish features are now in use.

[![NUTS][13]][13]

As far as the top two items in the diff, I can't make much of `ntoskrnl!?`:

[![NUTS][14]][14]

Or `sqltses!CSqlSortManager_80::GetSortKey`, but they're here for Smrtr Ppl™ to look at:

[![NUTS][15]][15]

Note that there is an undocumented and definitely not safe for production so please don't use it [startup trace flag][16] you can use to have additional temp table system objects (sysrowsets, sysallocunits, and sysseobjvalues) included in the in-memory feature, but it didn't make a noticeable difference in execution times in this case.

Roundup
--
Even in newer versions of SQL server, high frequency calls to table variables are much faster than high frequency calls to temp tables. 

Though it's tempting to blame compilations, recompilations, auto stats, latches, spinlocks, caching, or other issues, the issue is clearly still around managing temp table cleanup. 

It's a closer call in SQL Server 2019 with in-memory system tables enabled, but table variables still perform better when call frequency is high. 

Of course, as a vaping sage once mused: "use table variables when plan choice isn't an issue".


  [1]: https://docs.microsoft.com/en-us/sql/relational-databases/databases/tempdb-database?view=sql-server-2017#memory-optimized-tempdb-metadata
  [2]: http://bit.ly/Stack2013
  [3]: https://dba.stackexchange.com/q/232816/32281
  [4]: https://github.com/microsoft/perfview
  [5]: https://i.stack.imgur.com/e3mI2.jpg
  [6]: https://i.stack.imgur.com/BaoJv.jpg
  [7]: https://i.stack.imgur.com/mkeXm.jpg
  [8]: https://i.stack.imgur.com/X7Ggr.jpg
  [9]: https://blogs.msdn.microsoft.com/sqlserverstorageengine/2018/07/16/public-preview-of-table-variable-deferred-compilation-in-azure-sql-database/
  [10]: https://i.stack.imgur.com/ZTyNy.jpg
  [11]: https://i.stack.imgur.com/cXrkW.jpg
  [12]: https://i.stack.imgur.com/rrCGz.jpg
  [13]: https://i.stack.imgur.com/LT3E3.jpg
  [14]: https://i.stack.imgur.com/l2vDJ.jpg
  [15]: https://i.stack.imgur.com/ORdLd.jpg
  [16]: https://dba.stackexchange.com/users/3899/chris-betti

2 Answers