sql-server add tag
Joe Obbish
This is a reproduction of an application code issue. I don't need workarounds. I am just curious about the technical explanation.

For the query:

```
CREATE TABLE #T1 (ID1 INT NOT NULL);
CREATE TABLE #T2 (ID1 INT NOT NULL, ID2 INT NOT NULL);

SELECT #T1.ID1, s.ID2
FROM #T1
INNER JOIN (SELECT 0 ID1) f ON #T1.ID1 = f.ID1
LEFT OUTER JOIN (
    SELECT ID1, MIN(ID2) ID2
    FROM #T2
    GROUP BY ID1
) s ON #T1.ID1 = s.ID1
OPTION (HASH JOIN);
```

I would expect the optimizer to simplify away the join to the derived table `f` and to therefore be eligible for a hash join plan. Instead, I get an error:

> Msg 8622, Level 16, State 1, Line 5  
> Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN.

Why does this occur?

---

Also, why does the error **not** occur for:


```
SELECT #T1.ID1, s.ID2
FROM #T1
INNER JOIN (SELECT 0 ID1) f ON #T1.ID1 = f.ID1
LEFT OUTER JOIN (
	SELECT ID1, ID2
	FROM #T2
) s ON #T1.ID1 = s.ID1
OPTION (HASH JOIN);
```
Top Answer
i-one
This is because of `HASH` join algorithm requires join equality predicate, but query optimizer eliminates it during simplification.

One can see it by examining optimization path of the query (trace flags 8606 and 3604).

This is the logical tree of the query after simplification stage

```none
*** Simplified Tree: ***
    LogOp_LeftOuterJoin
        LogOp_Select
            LogOp_Get TBL: #T1 #T1 TableID=-1205191991 TableReferenceID=0 IsRow: COL: IsBaseRow1001 
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [tempdb].[dbo].[#T1].ID1
                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)
        LogOp_Project
            LogOp_GbAgg OUT(COL: Expr1007 ,)
                LogOp_Select
                    LogOp_Get TBL: #T2 #T2 TableID=-1189191934 TableReferenceID=0 IsRow: COL: IsBaseRow1005 
                    ScaOp_Comp x_cmpEq
                        ScaOp_Identifier QCOL: [tempdb].[dbo].[#T2].ID1
                        ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)
                AncOp_PrjList 
                    AncOp_PrjEl COL: Expr1007 
                        ScaOp_AggFunc stopMin Transformed
                            ScaOp_Identifier QCOL: [tempdb].[dbo].[#T2].ID2
            AncOp_PrjList 
                AncOp_PrjEl COL: Expr1009 
                    ScaOp_Identifier COL: Expr1007 
        ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
*******************
```

it can be expressed in T-SQL as

```sql
SELECT TG1.ID1, TG2.ID2
FROM (
    SELECT ID1
    FROM #T1
    WHERE ID1 = 0
) TG1
LEFT OUTER JOIN (
    SELECT MIN(ID2) ID2
    FROM #T2
    WHERE ID1 = 0
) TG2 ON 1 = 1
```

Predicates `#T1.ID1 = 0` and `#T2.ID1 = 0` are created by the rule named `ImpliedPredInnerAndAllLeftJn` first, and then pushed towards data accessors (`LogOp_Get` nodes) of `#T1` and `#T2` respectively as a result of a number of other simplifications applied.

There are also join collapse and project normalization stages as usual, but they do not change logical tree shape. So, right before entering plan search stages we have `LEFT JOIN` between two data sources `ON 1 = 1` (well, `ON True` actually) and there is the `OPTION (HASH JOIN)` as well.

The only rules that can be used for `LEFT JOIN` implementation in this situation are `LOJNtoHS` and `ROJNtoHS` (since join commute is considered too), but predicate `1 = 1` does not supplies optimizer with information regarding what column should be used to build hash table and what column should be used for probing then. The lack of allowed implementation rules appropriate to the task causes optimizer to fail with _Msg 8622_.

Out of curiosity we can disable `ImpliedPredInnerAndAllLeftJn` rule, that will prevent simplifying join predicate out and allow query to compile

![pic1.png](/image?hash=99e8bd43a2960bda74888bb5dade210f107b1dcb7e7bc728dcf24162a7a10773)

See [Implied Predicates and Query Hints](https://docs.microsoft.com/en-us/archive/blogs/craigfr/implied-predicates-and-query-hints) by Craig Freedman also.

---

The second query is a little different. Right before entering search stages the logical tree looks as

```none
*** Tree After Project Normalization ***
    LogOp_LeftOuterJoin
        LogOp_Select
            LogOp_Get TBL: #T1 #T1 TableID=-1205191991 TableReferenceID=0 IsRow: COL: IsBaseRow1001 
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [tempdb].[dbo].[#T1].ID1
                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)
        LogOp_Select
            LogOp_Project
                LogOp_Get TBL: #T2 #T2 TableID=-1189191934 TableReferenceID=0 IsRow: COL: IsBaseRow1005 
                AncOp_PrjList 
                    AncOp_PrjEl COL: Expr1008 
                        ScaOp_Identifier QCOL: [tempdb].[dbo].[#T2].ID2
            ScaOp_Comp x_cmpEq
                ScaOp_Identifier QCOL: [tempdb].[dbo].[#T2].ID1
                ScaOp_Const TI(int,ML=4) XVAR(int,Not Owned,Value=0)
        ScaOp_Const TI(bit,ML=1) XVAR(bit,Not Owned,Value=1)
****************************************
```

which can be expressed in T-SQL as

```sql
SELECT TG1.ID1, TG2.ID2
FROM (
    SELECT ID1
    FROM #T1
    WHERE ID1 = 0
) TG1
LEFT OUTER JOIN (
    SELECT ID1, ID2
    FROM #T2
    WHERE ID1 = 0
) TG2 ON 1 = 1
```

So, we have `LEFT JOIN` between two sources of data `ON 1 = 1` along with `OPTION (HASH JOIN)` again, but compilation does not fail as it was in the previous case. Why?

Query optimizer uses the fact that if `A = X` and `B = X` then `A = B`. In this case `#T1.ID1 = 0`, `#T2.ID1 = 0`, `#T1.ID1` is the column of `TG1` and `#T2.ID1` is the column of `TG2`. It is recognized by optimizer and join equality predicate `#T1.ID1 = #T2.ID1` is constructed. It happens as a part of `LOJNtoHS` substitution build logic (trace flags 8019 and 8021 to see this step)

```none
Apply Rule: LOJNtoHS - LOJN -> HS

Rule Result: group=16 3 <LOJNtoHS>PhyOp_HashJoinx_jtLeftOuter 11 15 17 (Distance = 1)

Subst:
  PhyOp_HashJoinx_jtLeftOuter (QCOL: [tempdb].[dbo].[#T1].ID1) = (QCOL: [tempdb].[dbo].[#T2].ID1)
      Leaf-Op 11
      Leaf-Op 15
      ScaOp_Comp x_cmpIs
          ScaOp_Identifier QCOL: [tempdb].[dbo].[#T1].ID1
          ScaOp_Identifier QCOL: [tempdb].[dbo].[#T2].ID1
```

Speaking precisely `x_cmpIs` implements `IS NOT DISTINCT FROM` comparison (in terms of SQL Standard), i.e. `(A = B) OR (A IS NULL) AND (B IS NULL)`.

This logic is common among all hash join implementation rules (`JNtoHS`, `ROJNtoHS`, etc.) and available since SQL Server 2012 and onward.

In the first case optimizer was not able to construct join equality predicate, because of `#T2.ID1` column was simplified out from the `SELECT` list of `s` (vector aggregation transformed to scalar).

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.