I have the following [fairly meaningless, just for the purpose of demonstration] query in the Stack Overflow database:
SELECT *
FROM Users u
LEFT JOIN Comments c
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
The only index on the `Users` table is a clustered index on ID.
The `Comments` table has the following Non-Clustered indexes as well as Clustered Index on ID:
CREATE INDEX IX_UserID ON Comments
(
UserID,
PostID
)
CREATE INDEX IX_PostID ON Comments
(
PostID,
UserID
)
The estimated plan for the query is [here][1]:
I can see the first thing the optimizer will do is perform a CI scan on the users table to filter only those users where `DisplayName = Alex`, effectively doing this:
SELECT *
FROM Users u
WHERE u.DisplayName = 'alex'
ORDER BY Id
and retreiving results as such:
[![enter image description here][2]][2]
Then it will scan the comments CI and for every row, look to see if the row satisfies the predicate
u.Id = c.UserId OR u.Id = c.PostId
Despite the two indexes, this CI scan is performed.
Wouldn't it be more efficient if the optimizer did a separate seek on each of the indexes in the Comments table above and join them together?
If I visualise what that would look like, in the screenshot above we can see the first result of the Users CI scan is ID 420
I can visualize what the `IX_UserID` Index looks like using
SELECT UserID,
PostID
FROM Comments
ORDER BY UserID,
PostID
so if I seek to the rows for user ID 420 as an index seek would:
[![enter image description here][3]][3]
for every row where `UserID = 420`, I can look if `u.Id = c.UserId OR u.Id = c.PostId` of, course they all match the `u.Id = c.UserId` part of our predicate,
So for the second part of our index seek, we can seek through our index `IX_PostID` which can be visualised as follows:
SELECT PostID,
UserID
FROM Comments
ORDER BY PostID,
UserID
If I seek to Post ID 420 I can see nothing is there:
[![enter image description here][4]][4]
So we then go back to the results of the CI scan, move to the next row (userId 447) and repeat the process.
The behaviour I have described above is possible using in a `WHERE` clause:
SELECT UserID,
PostID
FROM Comments
WHERE UserID = 420 OR PostID = 420
ORDER BY UserID,
PostID
[Plan here][5]
My question therefore is, why isn't an `OR` condition in a `JOIN` clause able to perform an index seek on appropriate indexes?
[1]:https://www.brentozar.com/pastetheplan/?id=H1L-yr_rU
[2]: https://i.stack.imgur.com/hHUi3.png
[3]: https://i.stack.imgur.com/23B4z.png
[4]: https://i.stack.imgur.com/WMoyX.png
[5]: https://www.brentozar.com/pastetheplan/?id=HJk1HmP48
Top Answer
Paul White
@@@ answer 958
The optimizer doesn't always consider [index union plans](https://docs.microsoft.com/en-nz/archive/blogs/craigfr/index-union) because they are not commonly applicable, can be expensive to generate, and can lead to an explosion of the search space.
The heuristics that prevent the optimizer generating an *index union* plan e.g. via exploration rule `LeftSideJNtoIdxLookup` can be disabled with undocumented trace flag 8726.
Most often it will be more convenient to use the `FORCESEEK` hint, but this trace flag can be used in development or test to see if a good multi-index plan could be generated.
A plan generating an unwanted *index union* can be prevented from doing so with trace flag 8727.
Rather than focusing on how to improve a query like this, which is what the other answers are doing, I'm going to try to answer the question being asked: *why* doesn't the optimizer produce a plan like the one you've described (that scans the Users table, and then seeks into the two indexes on the Comments table).
Here's your original query again (note I'm using `MAXDOP 2` just to simulate what I saw in your execution plans):
```
SELECT *
FROM Users u
LEFT JOIN Comments c
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);
```
And the plan:
[![Screenshot of original left join plan][1]][1]
- Scan of `dbo.Users` with residual predicate to get just the "alex" users
- For each of those users, scan the `dbo.Comments` table and filter matches in the join operator
- **Estimated cost: 293.161 optimizer units**
One attempt to get the plan you want would be to try and *force* a seek on the `dbo.Comments` table:
```
SELECT *
FROM Users u
LEFT JOIN Comments c WITH (FORCESEEK)
ON u.Id = c.UserId OR
u.Id = c.PostId
WHERE u.DisplayName = 'alex'
OPTION (MAXDOP 2);
```
The plans looks like this:
[![Screenshot of left join plan with hints][2]][2]
- scan of the `dbo.Users` table (with a residual predicate to only get users named "alex"),
- seek into each of the two indexes to get the requested Id values (which are unioned together)
- followed up by a key lookup to get the rest of the columns (since we selected *)
- **Estimated cost: 5.98731 optimizer units**
So the answer is that the optimizer is definitely *capable* of producing such a plan. And it doesn't seem to be a cost-based decision (the seek plan looks much cheaper).
My best guess is that this is just some kind of limitation in the optimizer's exploration process - it doesn't seem to favor converting a left join with an `OR` clause into an apply. This is *really* unfortunate in this particular case, as performance is dismal in the scan plan (the query takes 45 seconds on my machine) vs the apply plan (less than 1 second).
As [Rob Farley helpfully points out](https://topanswers.xyz/databases?q=815#a968), using `APPLY` directly (potentially with a `UNION` as well) is a better approach to get the plan you're looking for - both of those produce the "better" version of this plan (the `FORCESEEK` version). I would say that "`OR` in an outer `JOIN`" is kind of a known anti-pattern, and should be avoided since it doesn't seem like the optimizer has great support for that type of query directly.
[1]: https://i.stack.imgur.com/FQApZ.png
[2]: https://i.stack.imgur.com/k1RaY.png
When you have a join, the Query Optimizer will consider how best to satisfy the predicates involved with the various join techniques. It doesn’t try to re-evaluate the query as if had been written with APPLY, which is what you kinda want here, where it would see the right hand side of the join as like a sub-query.
You can try this youself, by doing something like:
SELECT *
FROM Users u
OUTER APPLY (
SELECT *
FROM Comments c
WHERE u.Id = c.UserId
OR u.Id = c.PostId
) c
WHERE u.DisplayName = 'alex'
...but given than OR is annoyingly often not turned into a UNION, I’d prefer:
SELECT *
FROM Users u
OUTER APPLY (
SELECT *
FROM Comments c
WHERE u.Id = c.UserId
UNION
SELECT *
FROM Comments c
WHERE u.Id = c.PostId
) c
WHERE u.DisplayName = 'alex'