I have the following [fairly meaningless, just for the purpose of demonstration] query in the Stack Overflow database:

    SELECT	*
    FROM	Users u
    		LEFT JOIN Comments c
    		    ON u.Id = c.UserId OR
                   u.Id = c.PostId
    WHERE	u.DisplayName = 'alex'

The only index on the `Users` table is a clustered index on ID.

The `Comments` table has the following Non-Clustered indexes as well as Clustered Index on ID:

    CREATE INDEX IX_UserID ON Comments
    (
    	UserID,
    	PostID
    )

    CREATE INDEX IX_PostID ON Comments
    (
    	PostID,
    	UserID
    )

The estimated plan for the query is [here][1]:

I can see the first thing the optimizer will do is perform a CI scan on the users table to filter only those users where `DisplayName = Alex`, effectively doing this:

    SELECT	*
    FROM	Users u
    WHERE	u.DisplayName = 'alex'
    ORDER BY Id

and retreiving results as such:

[![enter image description here][2]][2]

Then it will scan the comments CI and for every row, look to see if the row satisfies the predicate 

    u.Id = c.UserId OR u.Id = c.PostId

Despite the two indexes, this CI scan is performed.

Wouldn't it be more efficient if the optimizer did a separate seek on each of the indexes in the Comments table above and join them together?

If I visualise what that would look like, in the screenshot above we can see the first result of the Users CI scan is ID 420

I can visualize what the `IX_UserID` Index looks like using 

    SELECT		UserID,
    			PostID
    FROM		Comments
    ORDER BY	UserID,
    			PostID

so if I seek to the rows for user ID 420 as an index seek would:

[![enter image description here][3]][3]

for every row where `UserID = 420`, I can look if  `u.Id = c.UserId OR u.Id = c.PostId` of, course they all match the `u.Id = c.UserId` part of our predicate, 

So for the second part of our index seek, we can seek through our index `IX_PostID` which can be visualised as follows:

    SELECT		PostID,
    			UserID
    FROM		Comments
    ORDER BY	PostID,
    			UserID 

If I seek to Post ID 420 I can see nothing is there:

[![enter image description here][4]][4]

So we then go back to the results of the CI scan, move to the next row (userId 447) and repeat the process.

The behaviour I have described above is possible using in a `WHERE` clause:

    SELECT		UserID,
    			PostID
    FROM		Comments
    WHERE		UserID = 420 OR PostID = 420
    ORDER BY	UserID,
    			PostID

[Plan here][5]

My question therefore is, why isn't an `OR` condition in a `JOIN` clause able to perform an index seek on appropriate indexes?

  [1]:https://www.brentozar.com/pastetheplan/?id=H1L-yr_rU
  [2]: https://i.stack.imgur.com/hHUi3.png
  [3]: https://i.stack.imgur.com/23B4z.png
  [4]: https://i.stack.imgur.com/WMoyX.png
  [5]: https://www.brentozar.com/pastetheplan/?id=HJk1HmP48
Top Answer
Paul White
@@@ answer 958

The optimizer doesn't always consider [index union plans](https://docs.microsoft.com/en-nz/archive/blogs/craigfr/index-union) because they are not commonly applicable, can be expensive to generate, and can lead to an explosion of the search space.

The heuristics that prevent the optimizer generating an *index union* plan e.g. via exploration rule `LeftSideJNtoIdxLookup` can be disabled with undocumented trace flag 8726.

Most often it will be more convenient to use the `FORCESEEK` hint, but this trace flag can be used in development or test to see if a good multi-index plan could be generated.

A plan generating an unwanted *index union* can be prevented from doing so with trace flag 8727.
Answer #2
Josh Darnell (imported from SE)
Rather than focusing on how to improve a query like this, which is what the other answers are doing, I'm going to try to answer the question being asked: *why* doesn't the optimizer produce a plan like the one you've described (that scans the Users table, and then seeks into the two indexes on the Comments table).

Here's your original query again (note I'm using `MAXDOP 2` just to simulate what I saw in your execution plans):

```
SELECT  *
FROM    Users u
        LEFT JOIN Comments c
            ON u.Id = c.UserId OR
               u.Id = c.PostId
WHERE   u.DisplayName = 'alex'
OPTION (MAXDOP 2);
```

And the plan:

[![Screenshot of original left join plan][1]][1]

- Scan of `dbo.Users` with residual predicate to get just the "alex" users
- For each of those users, scan the `dbo.Comments` table and filter matches in the join operator
- **Estimated cost: 293.161 optimizer units**

One attempt to get the plan you want would be to try and *force* a seek on the `dbo.Comments` table:

```
SELECT  *
FROM    Users u
        LEFT JOIN Comments c WITH (FORCESEEK)
            ON u.Id = c.UserId OR
               u.Id = c.PostId
WHERE   u.DisplayName = 'alex'
OPTION (MAXDOP 2);
```

The plans looks like this:

[![Screenshot of left join plan with hints][2]][2]

- scan of the `dbo.Users` table (with a residual predicate to only get users named "alex"), 
- seek into each of the two indexes to get the requested Id values (which are unioned together)
- followed up by a key lookup to get the rest of the columns (since we selected *)
- **Estimated cost: 5.98731 optimizer units**

So the answer is that the optimizer is definitely *capable* of producing such a plan.  And it doesn't seem to be a cost-based decision (the seek plan looks much cheaper).

My best guess is that this is just some kind of limitation in the optimizer's exploration process - it doesn't seem to favor converting a left join with an `OR` clause into an apply.   This is *really* unfortunate in this particular case, as performance is dismal in the scan plan (the query takes 45 seconds on my machine)  vs the apply plan (less than 1 second).

As [Rob Farley helpfully points out](https://topanswers.xyz/databases?q=815#a968), using `APPLY` directly (potentially with a `UNION` as well) is a better approach to get the plan you're looking for - both of those produce the "better" version of this plan (the `FORCESEEK` version).  I would say that "`OR` in an outer `JOIN`" is kind of a known anti-pattern, and should be avoided since it doesn't seem like the optimizer has great support for that type of query directly.

  [1]: https://i.stack.imgur.com/FQApZ.png
  [2]: https://i.stack.imgur.com/k1RaY.png
Answer #3
rob farley (imported from SE)
When you have a join, the Query Optimizer will consider how best to satisfy the predicates involved with the various join techniques. It doesn’t try to re-evaluate the query as if had been written with APPLY, which is what you kinda want here, where it would see the right hand side of the join as like a sub-query.

You can try this youself, by doing something like:

    SELECT  *
    FROM    Users u
        OUTER APPLY (
            SELECT *
            FROM Comments c
            WHERE u.Id = c.UserId 
            OR u.Id = c.PostId
        ) c
    WHERE   u.DisplayName = 'alex'

...but given than OR is annoyingly often not turned into a UNION, I’d prefer:

    SELECT  *
    FROM    Users u
        OUTER APPLY (
            SELECT *
            FROM Comments c
            WHERE u.Id = c.UserId 
            UNION
            SELECT *
            FROM Comments c
            WHERE u.Id = c.PostId
        ) c
    WHERE   u.DisplayName = 'alex'

This room is for discussion about this question.

Once logged in you can direct comments to any contributor here.

Enter question or answer id or url (and optionally further answer ids/urls from the same question) from

Separate each id/url with a space. No need to list your own answers; they will be imported automatically.