This instance hosts the SharePoint 2007 databases (SP). We have been experiencing numerous SELECT/INSERT deadlocks against one heavily utilized table within SP content database. I have narrowed down the resources involved, both processes are requiring locks on the non-clustered index. The INSERT needs an IX lock on the SELECT resource and the SELECT needs a S lock on the INSERT resource. The deadlock graph depicts and three resources, 1.) two from the SELECT (producer/consumer parallel threads), and 2.) the INSERT. I have attached the deadlock graph for your review. Because this is Microsoft code and table structures we cannot make any changes. However, I have read, on the MSFT SP site, that they recommend setting MAXDOP Instance level configuration option to 1. Since this instance is shared amongst many other databases/applications this setting cannot be disabled. Therefore, I decided to try and prevent these SELECT statements from going parallel. I know this is not a solution but more a temporary modification to help with troubleshooting. Therefore, I increased the “Cost Threshold for Parallelism” from our standard 25 to 40 upon doing so, even though the workload has not changed (SELECT/INSERT occurring frequently) the deadlocks have disappeared. My question is why? * SPID 356 INSERT has an IX lock on a page belonging to the non-clustered index * SPID 690 SELECT Execution ID 0 has S lock on a page belonging to the same non clustered index Now * SPID 356 wants an IX lock on SPID 690 resource but cannot obtain it because SPID 356 is being blocked by SPID 690 Execution ID 0 S lock * SPID 690 Execution ID 1 wants a S lock on SPID 356 resource but cannot obtain it because SPID 690 Execution ID 1 is being blocked by SPID 356 and now we have our deadlock. [Execution Plan can be found on my SkyDrive][1] [Full Deadlock Details can be found here][2] If someone can help me understand why I would really appreciate it. ### EventReceivers Table. Column|Type|Nullable|Length -|-|-|- Id|uniqueidentifier|no|16 Name|nvarchar|no|512 SiteId|uniqueidentifier|no|16 WebId|uniqueidentifier|no|16 HostId|uniqueidentifier|no|16 HostType|int|no|4 ItemId|int|no|4 DirName|nvarchar|no|512 LeafName|nvarchar|no|256 Type|int|no|4 SequenceNumber|int|no|4 Assembly|nvarchar|no|512 Class|nvarchar|no|512 Data|nvarchar|no|512 Filter|nvarchar|no|512 SourceId|tContentTypeId|no|512 SourceType|int|no|4 Credential|int|no|4 ContextType|varbinary|no|16 ContextEventType|varbinary|no|16 ContextId|varbinary|no|16 ContextObjectId|varbinary|no|16 ContextCollectionId|varbinary|no|16 --- index_name|index_description|index_keys -|-|- EventReceivers_ByContextCollectionId|nonclustered located on PRIMARY|SiteId, ContextCollectionId EventReceivers_ByContextObjectId|nonclustered located on PRIMARY|SiteId, ContextObjectId EventReceivers_ById|nonclustered, unique located on PRIMARY|SiteId, Id EventReceivers_ByTarget|clustered, unique located on PRIMARY|SiteId, WebId, HostId, HostType, Type, ContextCollectionId, ContextObjectId, ContextId, ContextType, ContextEventType, SequenceNumber, Assembly, Class EventReceivers_IdUnique|nonclustered, unique, unique key located on PRIMARY|Id [1]: https://skydrive.live.com/embed?cid=3D8B7CA73E2B4FF1&resid=3D8B7CA73E2B4FF1!115&authkey=AAtcrng2xmazQCs [2]: https://skydrive.live.com/embed?cid=3D8B7CA73E2B4FF1&resid=3D8B7CA73E2B4FF1!121&authkey=AMGXx15G4B3wQ3w
On the face of it, this looks like a classic [lookup deadlock][1]. The essential ingredients for this deadlock pattern are: - a `SELECT` query that uses a non-covering nonclustered index with a Key Lookup - an `INSERT` query that modifies the clustered index and then the nonclustered index The `SELECT` accesses the nonclustered index first, then the clustered index. The `INSERT` access the clustered index first, then the nonclustered index. Accessing the same resources in a different order acquiring incompatible locks is a great way to 'achieve' a deadlock of course. In this case, the `SELECT` query is: ![SELECT query][2] ...and the `INSERT` query is: ![INSERT query][3] Notice the green highlighted non-clustered indexes maintenance. We would need to see the serial version of the `SELECT` plan in case it is very different from the parallel version, but as Jonathan Kehayias notes in his guide to [Handling Deadlocks][4], this particular deadlock pattern is very sensitive to timing and internal query execution implementation details. This type of deadlock often comes and goes without an obvious external reason. Given access to the system concerned, and suitable permissions, I am certain we could eventually work out exactly why the deadlock occurs with the parallel plan but not the serial (assuming the same general shape). Potential lines of enquiry include checking for optimized nested loops and/or prefetching - both of which can internally [escalate the isolation level][5] to `REPEATABLE READ` for the duration of the statement. It is also possible that some feature of parallel index seek range assignment contributes to the issue. If the serial plan becomes available, I might spend some time looking into the details further, as it is potentially interesting. The usual solution for this type of deadlocking is to make the index covering, though the number of columns in this case might make that impractical (and besides, we are not supposed to mess with such things on SharePoint, I am told). Ultimately, the recommendation for serial-only plans when using SharePoint is there for a reason (though not necessarily a good one, when it comes right down to it). If the change in [cost threshold for parallelism][6] fixes the issue for the moment, this is good. Longer term, I would probably look to separate the workloads, perhaps using Resource Governor so that SharePoint internal queries get the desired `MAXDOP 1` behaviour and the other application is able to use parallelism. The question of exchanges appearing in the deadlock trace seems a red herring to me; simply a consequence of the independent threads owning resources which technically must appear in the tree. I cannot see anything to suggest that the exchanges themselves are contributing directly to the deadlocking issue. [1]: https://web.archive.org/web/20180123124508/http://sqlblog.com/blogs/alexander_kuznetsov/archive/2008/05/03/when-index-covering-prevents-deadlocks.aspx [2]: https://i.stack.imgur.com/Uxgk3.jpg [3]: https://i.stack.imgur.com/3dd65.jpg [4]: https://www.red-gate.com/simple-talk/sql/database-administration/handling-deadlocks-in-sql-server/ [5]: https://blogs.msdn.microsoft.com/craigfr/2007/06/07/read-committed-and-bookmark-lookup/ [6]: https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms188603(v=sql.105)
If this was a classic [lookup deadlock][1], the resource list will include both the Clustered Index and Non-Clustered Index. Typically the SELECT will hold a SHARED lock on the NC index and wait for a SHARED lock on the CI, meanwhile the INSERT will acquire an EXCLUSIVE lock on the CI and wait for an EXCLUSIVE lock on the NC. The resource list in the deadlock xml will list both these objects in this case. Since the deadlock graph only involves the NC Index we can rule out that option. Also, If this was a dead lock due to [Nested Loop Join with UNORDERED PREFETCH][2], the execution plan will tell us whether UNORDERED PREFETCH algorithm is used, which is again not the case here (see update below). That leaves us to assume that this is a [deadlock due to the Parallel Plan.][3] The deadlock graph is not rendered properly, but if you look at the Deadlock XML, you can see that two threads from the SELECT statement (SPID 690) are involved in the deadlock . The consumer thread is holding a SHARED lock on PAGE 1219645 and waiting on the producer on port801f8ed0 (e_waitPipeGetRow). The producer thread is waiting for a shared lock on PAGE 1155940. The INSERT statement is holding an IX lock on PAGE 1155940 and waiting for an IX lock on PAGE 1219645, resulting in a deadlock. I believe that a deadlock will be averted when using a serial plan for the SELECT statement since at no point it will require SHARED lock on more than one page. I also think that the serial plan will be almost same as the parallel plan (sans the parallelism operator). [UPDATED based on Paul's comment] Apparently the plan is using an OPTIMIZED Nested Loop algorithm <NestedLoops Optimized="true"> That explains why the SHARED locks are held until the end of the statement. REPEATABLE READ combined with parallel plan is more vulnerable to deadlock than a serial plan because the parallel plan might acquire and keep locks from different ranges of an index whereas the serial plan acquires locks in a more sequential manner. [1]: http://sqlindian.com/2012/06/29/deadlocks-due-to-different-access-paths/ [2]: http://sqlindian.com/2012/07/13/deadlock-on-select-due-to-unordered-prefetch/ [3]: http://sqlindian.com/2012/07/06/deadlock-due-to-parallelism/