I have the following [fairly meaningless, just for the purpose of demonstration] query in the Stack Overflow database: SELECT * FROM Users u LEFT JOIN Comments c ON u.Id = c.UserId OR u.Id = c.PostId WHERE u.DisplayName = 'alex' The only index on the `Users` table is a clustered index on ID. The `Comments` table has the following Non-Clustered indexes as well as Clustered Index on ID: CREATE INDEX IX_UserID ON Comments ( UserID, PostID ) CREATE INDEX IX_PostID ON Comments ( PostID, UserID ) The estimated plan for the query is [here]: I can see the first thing the optimizer will do is perform a CI scan on the users table to filter only those users where `DisplayName = Alex`, effectively doing this: SELECT * FROM Users u WHERE u.DisplayName = 'alex' ORDER BY Id and retreiving results as such: [![enter image description here]] Then it will scan the comments CI and for every row, look to see if the row satisfies the predicate u.Id = c.UserId OR u.Id = c.PostId Despite the two indexes, this CI scan is performed. Wouldn't it be more efficient if the optimizer did a separate seek on each of the indexes in the Comments table above and join them together? If I visualise what that would look like, in the screenshot above we can see the first result of the Users CI scan is ID 420 I can visualize what the `IX_UserID` Index looks like using SELECT UserID, PostID FROM Comments ORDER BY UserID, PostID so if I seek to the rows for user ID 420 as an index seek would: [![enter image description here]] for every row where `UserID = 420`, I can look if `u.Id = c.UserId OR u.Id = c.PostId` of, course they all match the `u.Id = c.UserId` part of our predicate, So for the second part of our index seek, we can seek through our index `IX_PostID` which can be visualised as follows: SELECT PostID, UserID FROM Comments ORDER BY PostID, UserID If I seek to Post ID 420 I can see nothing is there: [![enter image description here]] So we then go back to the results of the CI scan, move to the next row (userId 447) and repeat the process. The behaviour I have described above is possible using in a `WHERE` clause: SELECT UserID, PostID FROM Comments WHERE UserID = 420 OR PostID = 420 ORDER BY UserID, PostID [Plan here] My question therefore is, why isn't an `OR` condition in a `JOIN` clause able to perform an index seek on appropriate indexes? :https://www.brentozar.com/pastetheplan/?id=H1L-yr_rU : https://i.stack.imgur.com/hHUi3.png : https://i.stack.imgur.com/23B4z.png : https://i.stack.imgur.com/WMoyX.png : https://www.brentozar.com/pastetheplan/?id=HJk1HmP48
@@@ answer 958 The optimizer doesn't always consider [index union plans](https://docs.microsoft.com/en-nz/archive/blogs/craigfr/index-union) because they are not commonly applicable, can be expensive to generate, and can lead to an explosion of the search space. The heuristics that prevent the optimizer generating an *index union* plan e.g. via exploration rule `LeftSideJNtoIdxLookup` can be disabled with undocumented trace flag 8726. Most often it will be more convenient to use the `FORCESEEK` hint, but this trace flag can be used in development or test to see if a good multi-index plan could be generated. A plan generating an unwanted *index union* can be prevented from doing so with trace flag 8727.
Josh Darnell (imported from SE)
Rather than focusing on how to improve a query like this, which is what the other answers are doing, I'm going to try to answer the question being asked: *why* doesn't the optimizer produce a plan like the one you've described (that scans the Users table, and then seeks into the two indexes on the Comments table). Here's your original query again (note I'm using `MAXDOP 2` just to simulate what I saw in your execution plans): ``` SELECT * FROM Users u LEFT JOIN Comments c ON u.Id = c.UserId OR u.Id = c.PostId WHERE u.DisplayName = 'alex' OPTION (MAXDOP 2); ``` And the plan: [![Screenshot of original left join plan]] - Scan of `dbo.Users` with residual predicate to get just the "alex" users - For each of those users, scan the `dbo.Comments` table and filter matches in the join operator - **Estimated cost: 293.161 optimizer units** One attempt to get the plan you want would be to try and *force* a seek on the `dbo.Comments` table: ``` SELECT * FROM Users u LEFT JOIN Comments c WITH (FORCESEEK) ON u.Id = c.UserId OR u.Id = c.PostId WHERE u.DisplayName = 'alex' OPTION (MAXDOP 2); ``` The plans looks like this: [![Screenshot of left join plan with hints]] - scan of the `dbo.Users` table (with a residual predicate to only get users named "alex"), - seek into each of the two indexes to get the requested Id values (which are unioned together) - followed up by a key lookup to get the rest of the columns (since we selected *) - **Estimated cost: 5.98731 optimizer units** So the answer is that the optimizer is definitely *capable* of producing such a plan. And it doesn't seem to be a cost-based decision (the seek plan looks much cheaper). My best guess is that this is just some kind of limitation in the optimizer's exploration process - it doesn't seem to favor converting a left join with an `OR` clause into an apply. This is *really* unfortunate in this particular case, as performance is dismal in the scan plan (the query takes 45 seconds on my machine) vs the apply plan (less than 1 second). As [Rob Farley helpfully points out](https://topanswers.xyz/databases?q=815#a968), using `APPLY` directly (potentially with a `UNION` as well) is a better approach to get the plan you're looking for - both of those produce the "better" version of this plan (the `FORCESEEK` version). I would say that "`OR` in an outer `JOIN`" is kind of a known anti-pattern, and should be avoided since it doesn't seem like the optimizer has great support for that type of query directly. : https://i.stack.imgur.com/FQApZ.png : https://i.stack.imgur.com/k1RaY.png
When you have a join, the Query Optimizer will consider how best to satisfy the predicates involved with the various join techniques. It doesn’t try to re-evaluate the query as if had been written with APPLY, which is what you kinda want here, where it would see the right hand side of the join as like a sub-query. You can try this youself, by doing something like: SELECT * FROM Users u OUTER APPLY ( SELECT * FROM Comments c WHERE u.Id = c.UserId OR u.Id = c.PostId ) c WHERE u.DisplayName = 'alex' ...but given than OR is annoyingly often not turned into a UNION, I’d prefer: SELECT * FROM Users u OUTER APPLY ( SELECT * FROM Comments c WHERE u.Id = c.UserId UNION SELECT * FROM Comments c WHERE u.Id = c.PostId ) c WHERE u.DisplayName = 'alex'