Cast to date is sargable but is it a good idea?

sql-server add tag

Mikael Eriksson (imported from SE)

In SQL Server 2008 the [date](http://msdn.microsoft.com/en-us/library/bb630352.aspx) datatype was added.

Casting a `datetime` column to `date` is [sargable](https://dba.stackexchange.com/questions/162263/what-does-the-word-sargable-really-mean) and can use an index on the `datetime` column.

    select *
    from T
    where cast(DateTimeCol as date) = '20130101';

The other option you have is to use a range instead.

    select *
    from T
    where DateTimeCol >= '20130101' and
          DateTimeCol < '20130102'


Are these queries equally good or should one be preferred over the other?

Top Answer

Martin Smith (imported from SE)

The mechanism behind the sargability of casting to date is called [dynamic seek][1]. 

SQL Server calls an internal function `GetRangeThroughConvert` to get the start and end of the range.

Somewhat surprisingly this is **not** the same range as your literal values.

Creating a table with a row per page and 1440 rows per day

    CREATE TABLE T
      (
         DateTimeCol DATETIME PRIMARY KEY,
         Filler      CHAR(8000) DEFAULT 'X'
      );
    
    WITH Nums(Num)
         AS (SELECT number
             FROM   spt_values
             WHERE  type = 'P'
                    AND number BETWEEN 1 AND 1440),
         Dates(Date)
         AS (SELECT {d '2012-12-30'} UNION ALL
             SELECT {d '2012-12-31'} UNION ALL
             SELECT {d '2013-01-01'} UNION ALL
             SELECT {d '2013-01-02'} UNION ALL
             SELECT {d '2013-01-03'})
    INSERT INTO T
                (DateTimeCol)
    SELECT DISTINCT DATEADD(MINUTE, Num, Date)
    FROM   Nums,
           Dates 

Then running

    SET STATISTICS IO ON;
    SET STATISTICS TIME ON;
    
    SELECT *
    FROM   T
    WHERE  DateTimeCol >= '20130101'
           AND DateTimeCol < '20130102'
    
    SELECT *
    FROM   T
    WHERE  CAST(DateTimeCol AS DATE) = '20130101'; 

The first query has `1443` reads and the second `2883` so it is reading an entire additional day then discarding it against a residual predicate.

The plan shows the seek predicate is 

    Seek Keys[1]: Start: DateTimeCol > Scalar Operator([Expr1006]), 
                   End: DateTimeCol < Scalar Operator([Expr1007])


So instead of `>= '20130101' ... < '20130102'` it reads `> '20121231' ... < '20130102'` then discards all the `2012-12-31` rows.

Another disadvantage of relying on it is that the cardinality estimates may not be as accurate as with the traditional range query. This can be seen in an amended version of your [SQL Fiddle][2].

All 100 rows in the table now match the predicate (with datetimes 1 minute apart all on the same day). 

The second (range) query correctly estimates that 100 will match and uses a clustered index scan. The `CAST( AS DATE)` query incorrectly estimates that only one row will match and produces a plan with key lookups. 

The statistics aren't ignored completely. If all rows in the table have the same `datetime` and it matches the predicate (e.g. `20130101 00:00:00` or `20130101 01:00:00`) then the plan shows a clustered index scan with an estimated 31.6228 rows. 

    100 ^ 0.75 = 31.6228

So in that case it appears the estimate is derived from [this formula][3]:

>The following table shows the number of conjuncts guessed and the resultant selectivity as a function of input table cardinality of N:

>```none
| Conjuncts | Cardinality | Selectivity |
|-----------|-------------|-------------|
| 1         | N^(3/4)     | N^(-1/4)    |
| 2         | N^(11/16)   | N^(-5/16)   |
| 3         | N^(43/64)   | N^(-21/64)  |
| 4         | N^(171/256) | N^(-85/256) |
| 5         | N^(170/256) | N^(-86/256) |
| 6         | N^(169/256) | N^(-87/256) |
| 7         | N^(168/256) | N^(-88/256) |
| ...       |             |             |
| 175       | N^(0/256)   | N^(-1)      |
>```


If all rows in the table have the same `datetime` and it doesn't match the predicate (e.g. `20130102 01:00:00`) then it falls back to the estimated row count of 1 and the plan with lookups. 

For the cases where the table has more than one `DISTINCT` value the estimated rows seems to be the same as if the query was looking for exactly `20130101 00:00:00`. 

If the statistics histogram happens to have a step at `2013-01-01 00:00:00.000` then the estimate will be based on the `EQ_ROWS` (i.e. not taking into account other times on that date). Otherwise if there is no step it looks as though it uses the `AVG_RANGE_ROWS` from the surrounding steps.

As `datetime` has a precision of approx 3ms in many systems there will be very few actual duplicate values and this number will be 1.

  [1]: https://sql.kiwi/2012/01/dynamic-seeks-and-hidden-implicit-conversions.html
  [2]: http://sqlfiddle.com/#!3/0c907c/1
  [3]: https://blogs.msdn.microsoft.com/ianjo/2006/03/28/disabling-constant-constant-comparison-estimation/

Answer #2

meme (imported from SE)

I know that this has a long-standing Great Answer® from Martin, but I wanted to add in some changes to the behavior here in newer versions of SQL Server. This appears only to have been tested up to 2008R2.

With the new [USE HINTs][1] that make doing some cardinality estimation time travel possible, we can see when things changed.

Using the same setup as in the SQL Fiddle.

    CREATE TABLE T ( ID INT IDENTITY PRIMARY KEY, DateTimeCol DATETIME, Filler CHAR(8000) NULL );
    
    CREATE INDEX IX_T_DateTimeCol ON T ( DateTimeCol );
    
    
    WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
         E02(N) AS (SELECT 1 FROM E00 a, E00 b),
         E04(N) AS (SELECT 1 FROM E02 a, E02 b),
         E08(N) AS (SELECT 1 FROM E04 a, E04 b),
         Num(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY E08.N) FROM E08)
    INSERT INTO T(DateTimeCol)
    SELECT TOP 100 DATEADD(MINUTE, Num.N, '20130101')
    FROM Num;

We can test the different levels like so:

    SELECT *
    FROM   T
    WHERE  CAST(DateTimeCol AS DATE) = '20130101'
    OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_100' ));
    GO
    
    SELECT *
    FROM   T
    WHERE  CAST(DateTimeCol AS DATE) = '20130101'
    OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_110' ));
    GO 
    
    SELECT *
    FROM   T
    WHERE  CAST(DateTimeCol AS DATE) = '20130101'
    OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_120' ));
    GO 
    
    SELECT *
    FROM   T
    WHERE  CAST(DateTimeCol AS DATE) = '20130101'
    OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_130' ));
    GO 
    
    SELECT *
    FROM   T
    WHERE  CAST(DateTimeCol AS DATE) = '20130101'
    OPTION ( USE HINT ( 'QUERY_OPTIMIZER_COMPATIBILITY_LEVEL_140' ));
    GO 

The plans for all of these are [available here][2]. Compat levels 100 and 110 both give the key lookup plan, but starting with compat level 120, we start getting the same scan plan with 100 row estimates. This is true up to compat level 140.

[![NUTS][3]][3]

[![NUTS][4]][4]

[![NUTS][5]][5]

Cardinality estimation for the `>= '20130101', < '20130102'` plans remain at 100, which was expected.


  [1]: https://support.microsoft.com/en-us/help/4342424
  [2]: https://www.brentozar.com/pastetheplan/?id=H14WB1u_Q
  [3]: https://i.stack.imgur.com/LgfFa.jpg
  [4]: https://i.stack.imgur.com/7WVm1.jpg
  [5]: https://i.stack.imgur.com/fHz00.jpg

2 Answers