Concatenation Physical Operation: Does it guarantee order of execution?

sql-server add tag

Gordon Linoff (imported from SE)

In standard SQL, the result of a `union all` is not guaranteed to be in any order.  So, something like:

    select 'A' as c union all select 'B'

Could return two rows in any order (although, in practice on any database I know of, 'A' will come before 'B').

In SQL Server, this turns into an execution plan using a "concatenation" physical operation.

I could easily imagine that the concatenation operation would scan its inputs, returning whatever input has records available.  However, I found the following statement on the web ([here][1]):

> The Query Processor will execute this plan in the order that the
> operators appear in the plan, the first is the top one and the last is
> the end one.

Question:  Is this true in practice?  Is this guaranteed to be true?

I haven't found any reference in Microsoft documentation that the inputs are scanned in order, from the first to the last.  On the other hand, whenever I try running it, the results suggest that the inputs are, indeed, processed in order.

Is there a way to have the engine process more than one input at a time?  My tests (using much more complicated expressions than constants) are on a parallel-enabled 8-core machine, and most queries do take advantage of the parallelism.



  [1]: https://www.simple-talk.com/sql/learn-sql-server/showplan-operator-of-the-week---concatenation/#commentform

Top Answer

Mikael Eriksson (imported from SE)

According to [Craig Freedman](http://blogs.msdn.com/b/craigfr/about.aspx) the order of execution for the concatenation operator is guaranteed.

From his blog post [Viewing Query Plans](http://blogs.msdn.com/b/craigfr/archive/2006/06/13/629615.aspx) on MSDN Blogs:

> Note that when an operator has more than one child, the order of the
> children matters.  The topmost child is the first child while the
> bottommost child is the second.  The concatenation operator processes
> the children in this order.


And from books online [Showplan Logical and Physical Operators Reference](http://msdn.microsoft.com/en-us/library/ms191158.aspx)

> The Concatenation physical operator has two or more inputs and one
> output. Concatenation copies rows from the first input stream to the
> output stream, then repeats this operation for each additional input
> stream.

Answer #2

StrayCatDBA (imported from SE)

**No**, there is no documentation from Microsoft guaranteeing the behavior, therefore it is **not guaranteed**.

Additionally, assuming that the Simple Talk article is correct, and that the Concatenation physical operator always processes inputs in the order shown in the *plan* (very likely to be true), then without a guarantee that SQL Server will **always** generate plans that keep the same the order between the query text and the query plan, you're only slightly better off.

We can investigate this further though.  If the query optimizer was able to reorder the Concatenation operator input, there should exist rows in the undocumented DMV, `sys.dm_exec_query_transformation_stats` corresponding to that optimization.
    
    SELECT * FROM sys.dm_exec_query_transformation_stats 
        WHERE name LIKE '%CON%' OR name LIKE '%UNIA%'

On SQL Server 2012 Enterprise Edition, this produces 24 rows. Ignoring the false matches for transformations related to constants, there is one transformation related to the Concatenation Physical Operator `UNIAtoCON` (Union All to Concatenation).  So, at the physical operator level, it appears that once a concatenation operator is selected, it will be processed in the order of the logical Union All operator it was derived from.

---

In fact that is not quite true. Post-optimization rewrites exist that can reorder the inputs to a physical Concatenation operator after cost-based optimization has completed. One example occurs when the Concatenation is subject to a row goal (so it may be important to read from the cheaper input first). See [`UNION ALL` Optimization][2] by Paul White for more details.

That late physical rewrite was functional up to and including SQL Server 2008 R2, but a regression meant it no longer applied to SQL Server 2012 and later. A [fix has been issued][1] that reinstates this rewrite for SQL Server 2014 and later (not 2012) with query optimizer hotfixes enabled (e.g. trace flag 4199). 

---

But about the Logical Union All operator (`UNIA`)?  There is a `UNIAReorderInputs` transformation, which can reorder the inputs.  There are also two physical operators that can be used to implement a logical Union All, `UNIAtoCON` and `UNIAtoMERGE` (Union All to Merge Union).

Therefore it appears that the query optimizer *can* reorder the inputs for a `UNION ALL`; however, it doesn't appear to be a common transformation (zero uses of `UNIAReorderInputs` on the SQL Servers I have readily accessible. We don't know the circumstances that would make the optimizer use `UNIAReorderInputs`; though it is certainly used when a plan guide or use plan hint is used to force a plan generated using the row goal physical reordered inputs mentioned above.

>Is there a way to have the engine process more than one input at a time?

The Concatenation physical operator can exist within a parallel section of a plan.  With some difficulty, I was able to produce a plan with parallel concatenations using the following query:

    SELECT userid, regdate  FROM (  --Users table is around 3mil rows
        SELECT  userid, RegDate FROM users WHERE userid > 1000000
        UNION 
        SELECT  userid, RegDate FROM users WHERE userid < 1000000
        UNION all
        SELECT userid, RegDate FROM users WHERE userid < 2000000
        ) d ORDER BY RegDate OPTION (RECOMPILE)

So, in the strictest sense, the physical Concatenation operator does seem to always process inputs in a consistent fashion (top one first, bottom second); however, the optimizer could switch the order of the inputs before choosing the physical operator, or use a Merge union instead of a Concatenation.

  [1]: https://support.microsoft.com/en-us/help/4023419
  [2]: https://sqlperformance.com/2017/05/sql-plan/union-all-optimization

Answer #3

SE Anon or Wiki user (imported from SE)

I don't know if you can prove that any observed behavior is always guaranteed, one way or the other, unless you can manufacture a counter-example. In the absence of that, the way to fix the order that results are returned, of course, is to add an `ORDER BY`.

I don't know if there is a "fix", or that there exists a need for a fix, if you can demonstrate that in some scenarios the queries are processed in a different order.
  	 	
The lack of any explicit, official documentation suggests to me that you should not depend on this. This is exactly the kind of thing that got people into trouble with `ORDER BY` in a view, and `GROUP BY` without `ORDER BY`, 8 years ago when SQL Server 2005's optimizer was released.

With all of the new features in the newer versions of SQL Server (with more coming), even if you think you can guarantee a specific behavior today, I wouldn't expect it to hold true (until it is documented to do so).
  	 	
Even if you're not depending on this behavior, what are you going to do with the results? Anyway, I wouldn't call a Simple Talk article by an outsider *official*. For all we know this is just a guess based on observation.

Microsoft is never going to publish official documentation saying 'x' is *not* guaranteed to do 'y'. This is one of the reasons we still, almost a decade later, have trouble convincing people that they can't rely on observed ordering without `ORDER BY` - there is no documentation that states "it is not guaranteed."

3 Answers