Does SQL Server read all of a COALESCE function even if the first argument is not NULL?

sql-server add tag

Curt (imported from SE)

I'm using a T-SQL `COALESCE` function where the first argument will not be null on about 95% of the times it is ran. If the first argument is `NULL`, the second argument is quite a lengthy process:

    SELECT COALESCE(c.FirstName
                    ,(SELECT TOP 1 b.FirstName
                      FROM TableA a 
                      JOIN TableB b ON .....)
                    )

If, for example, `c.FirstName = 'John'`, would SQL Server still run the sub-query?

I know with the VB.NET `IIF()` function, if the second argument is True, the code still reads the third argument (even though it won't be used).

Top Answer

Paul White (imported from SE)

The documentation makes it reasonably clear that the *intention* is for `CASE` to short-circuit.  As [Aaron mentions](https://topanswers.xyz/databases?q=481#a532), there have been several reported instances where this has been shown to not always be true. So far, most of these have been acknowledged as bugs and fixed.

There are other issues with `CASE` (and therefore `COALESCE`) where side-effecting functions or sub-queries are used.  Consider:

    SELECT COALESCE((SELECT CASE WHEN RAND() <= 0.5 THEN 999 END), 999);
    SELECT ISNULL((SELECT CASE WHEN RAND() <= 0.5 THEN 999 END), 999);

The `COALESCE` form often returns null, as described in [a bug report][2] by Hugo Kornelis.

The demonstrated issues with optimizer transforms and common-expression-tracking mean that it is impossible to guarantee that `CASE` will short-circuit in all circumstances.

I think you can be reasonably confident that `CASE` will short-circuit in general (particularly if a reasonably-skilled person inspects the execution plan, and that execution plan is 'enforced' with a plan guide or hints) but if you need an absolute guarantee, you have to write SQL that does not include the expression at all.

  [2]: https://web.archive.org/web/20140329083016/https://connect.microsoft.com/SQLServer/feedback/details/546437/coalesce-subquery-1-may-return-null

Answer #2

Martin Smith (imported from SE)

Another example

    CREATE TABLE T1 (C INT PRIMARY KEY)
    
    CREATE TABLE T2 (C INT PRIMARY KEY)
    
    INSERT INTO T1 
    OUTPUT inserted.* INTO T2
    VALUES (1),(2),(3);


The query

    SET STATISTICS IO ON;

    SELECT T1.C,
           COALESCE(T1.C , CASE WHEN EXISTS (SELECT * FROM T2 WHERE T2.C = T1.C)  THEN -1 END)
    FROM T1
    OPTION (LOOP JOIN)

Shows no reads against `T2` at all. 

The seek of `T2` is under a pass through predicate and the operator is never executed. But

    SELECT T1.C,
           COALESCE(T1.C , CASE WHEN EXISTS (SELECT * FROM T2 WHERE T2.C = T1.C)  THEN -1 END)
    FROM T1
    OPTION (MERGE JOIN)

**Does** show that `T2` is read. Even though no value from `T2` is ever actually needed. 

Of course this is not really surprising but I thought worth adding to the counter example repository if only because it raises the issue of what short circuiting even means in a set based declarative language.

Answer #3

aaron bertrand (imported from SE)

How about this one - as reported to me by Itzik Ben-Gan, who was [told about it by Jaime Lafargue](http://aprendiendosqlserver.blogspot.com.es/2011/09/bug-en-sql-server-case-ejecuta.html)?

    DECLARE @i INT = 1;
    SELECT CASE WHEN @i = 1 THEN 1 ELSE MIN(1/0) END;

Result:

    Msg 8134, Level 16, State 1, Line 2
    Divide by zero error encountered.

There are trivial workarounds of course, but the point is still that `CASE` does not *always* guarantee left-to-right evaluation / short-circuiting. [I reported the bug here](http://web.archive.org/web/20140422050537/http://connect.microsoft.com/SQLServer/feedback/details/690017/case-coalesce-wont-always-evaluate-in-textual-order) and it was closed as "by design." Paul White subsequently filed [this Connect item](http://web.archive.org/web/20140912144142/http://connect.microsoft.com/SQLServer/feedback/details/691535/aggregates-dont-follow-the-semantics-of-case), and it was closed as Fixed. Not because it was fixed per se, but because they updated Books Online with a more accurate description of the scenario where aggregates can change the evaluation order of a `CASE` expression. I recently [blogged more about this here](http://sqlperformance.com/2014/06/t-sql-queries/dirty-secrets-of-the-case-expression).

While I agree that these are edge cases, that *most of the time* you can rely on left-to-right evaluation and short-circuiting, and that these are bugs that contradict the documentation and will probably eventually be fixed (this isn't definite - see the follow-up conversation on [Bart Duncan's blog post][1] to see why), I have to disagree when folks say that something is always true even if there is a single edge case that disproves it. If Itzik and others can find solitary bugs like this, it makes it at least in the realm of possibility that there are other bugs as well. And since we don't know the rest of the OP's query, we can't say for certain that he will rely on this short-circuiting but end up being bitten by it. So to me, the safer answer is:

You can *usually* rely on `CASE` to evaluate left-to-right and short-circuit, as described in the documentation, but it is not accurate to say that you can always do so. There are two demonstrated cases on this page where it is not true, and neither bug has been fixed in any publicly available version of SQL Server.

[Here is another case](http://web.archive.org/web/20140329081147/http://connect.microsoft.com/SQLServer/feedback/details/780132/) (I need to stop doing that) where a `CASE` expression does not evaluate in the order you would expect, even though no aggregates are involved.

  [1]: http://bartduncansql.wordpress.com/2011/03/03/dont-depend-on-expression-short-circuiting-in-t-sql-not-even-with-case/

Answer #4

Martin Smith (imported from SE)

I've come across another case where `CASE` / `COALESCE` do not short circuit. The following TVF will raise a PK violation if passed `1` as a parameter.

    CREATE FUNCTION F (@P INT)
    RETURNS @T TABLE (
      C INT PRIMARY KEY)
    AS
      BEGIN
          INSERT INTO @T
          VALUES      (1),
                      (@P)
    
          RETURN
      END
    
If called as follows
    
    DECLARE @Number INT = 1
    
    SELECT COALESCE(@Number, (SELECT number
                              FROM   master..spt_values
                              WHERE  type = 'P'
                                     AND number = @Number), 
                             (SELECT TOP (1)  C
                              FROM   F(@Number))) 

Or as

    DECLARE @Number INT = 1
    
    SELECT CASE
             WHEN @Number = 1 THEN @Number
             ELSE (SELECT TOP (1) C
                   FROM   F(@Number))
           END 

Both give the result

> Violation of PRIMARY KEY constraint 'PK__F__3BD019A800551192'. Cannot
> insert duplicate key in object 'dbo.@T'. The duplicate key value is
> (1).

showing that the `SELECT` (or at least the table variable population) is still carried out and raises an error even though that branch of the statement should never be reached. The plan for the `COALESCE` version is below.

[![Plan][1]][1]

This rewrite of the query appears to avoid the issue

    SELECT COALESCE(Number, (SELECT number
                              FROM   master..spt_values
                              WHERE  type = 'P'
                                     AND number = Number), 
                             (SELECT TOP (1)  C
                              FROM   F(Number))) 
    FROM (VALUES(1)) V(Number)   

Which gives plan


[![Plan2][2]][2]


  [1]: https://i.stack.imgur.com/JSoTc.png
  [2]: https://i.stack.imgur.com/5IKdF.png

Answer #5

JNK (imported from SE)

**Nope**.  Here's a simple test:

    SELECT COALESCE(1, (SELECT 1/0)) -- runs fine
    SELECT COALESCE(NULL, (SELECT 1/0)) -- throws error

If the second condition is evaluated, an exception is thrown for divide-by-zero.

Per the [MSDN Documentation][1] this is related to how `COALESCE` is viewed by the interpreter - it's just an easy way to write a `CASE` statement.  

`CASE` is well known to be one of the only functions in SQL Server that (mostly) reliably short circuits.

There are some exceptions when comparing to scalar variables and aggregations as shown by Aaron Bertrand in another answer here (and this would apply both to `CASE` and `COALESCE`):

    DECLARE @i INT = 1;
    SELECT CASE WHEN @i = 1 THEN 1 ELSE MIN(1/0) END;

will generate a division by zero error.

This should be considered a bug, and as a rule `COALESCE` will parse from left to right.


  [1]: http://msdn.microsoft.com/en-us/library/ms190349.aspx

Answer #6

Joe Celko (imported from SE)

The actual standard says that all of the WHEN clauses (as well as the ELSE  clause) have to be parsed to determine the data type of the expression as a whole. I'd really have to get out some of my old notes to determine how an error is handled. But just off hand, 1/0  uses integers, so I would assume that while it's an error. It's an error with the integer data type. When you only have nulls in the coalesce list, it's a little trickier to determine the data type, and that's another problem.

Answer #7

erike (imported from SE)

I just wanted to mention a strategy you may not have considered. It may not be a match here, but it does come in handy sometimes. See if this modification gives you any better performance:

    SELECT COALESCE(c.FirstName
                ,(SELECT TOP 1 b.FirstName
                  FROM TableA a 
                  JOIN TableB b ON .....
                  WHERE C.FirstName IS NULL) -- this is the changed part
                )

Another way to do it could be this (basically equivalent, but allows you to access more columns from the other query if necessary):

    SELECT COALESCE(c.FirstName, x.FirstName)
    FROM
       TableC c
       OUTER APPLY (
          SELECT TOP 1 b.FirstName
          FROM
             TableA a 
             JOIN TableB b ON ...
          WHERE
             c.FirstName IS NULL -- the important part
       ) x

Basically this is a technique of "hard" joining tables but including the condition on when any rows at all should be JOINed. In my experience this has really helped execution plans at times.

7 Answers