When should the IGNORE_DUP_KEY option be used on an index?

sql-server sql-server-2019 add tag

6 years ago Triynko (imported from SE)

Someone said it’s preferable to craft your queries to avoid duplicate key exceptions, but I’m not convinced that’s more performant that just setting IGNORE_DUP_KEY = ON for the index.

My goal is to ensure a row or set of rows exists for one or more users, before attempting to update those rows. I do this, so that when I attempt to update the row with an update statement like the one below, and no rows are affected, it’s because the [Count] portion of predicate wasn’t satisfied, as opposed to the row not existing at all (i.e. the [ID] portion of the predicate not being satisfied):

UPDATE [Inventory]
SET [Count] = [Count] + 1
WHERE [ID] = 3 
AND ([Count] + 1) <= @MaxInventory

xxxxxxxxxx
 
UPDATE [Inventory]SET [Count] = [Count] + 1WHERE [ID] = 3 AND ([Count] + 1) <= @MaxInventory

I could run EXISTS(SELECT 1 From [Inventory] WHERE [ID] = 3 to check for that single row, and only insert the row if it does not exist. That simply avoids unnecessary inserts. The insert, if necessary, would still have to contend with concurrent transactions, so duplicate key exceptions can still occur.

I’m curious whether it’s more performant to just turn on IGNORE_DUP_KEY in this scenario, rather than allowing the error to be thrown and caught. Specifically, I’m curious if it’s as-fast or possibly even faster than running an exists check, to just attempt to insert the record and let it ignore duplicate keys.

This becomes even more important, when I’m checking for and initializing multiple records at once. For example, if I need to ensure records for thousands of users exist in a single update statement, the logic would be much simpler if I just ran that insert statement up-front, letting it ignore duplicate keys. Avoiding duplicates would be more complex, because I’d have to first query the table for which records don’t exist, then attempt to add just those records (again, ignoring duplicate keys). Just inserting may be faster, even if all the records exist.

I could meet it halfway and check whether any of the records are missing, such as with a left join or a COUNT comparison, but why bother if the insert ignoring duplicate keys is just faster?

Is is a good idea to use IGNORE_DUP_KEY and just attempt inserts instead of bothering with checking for row existence ahead of time? If not, why?

Top Answer

6 years ago meme (imported from SE)

It’s definitely an atypical setting to enable for an index. I don’t think you’re going to find many people who jump up and talk about using it.

Of course, there are a couple helpful posts on the subject by Paul White:

If your concern is around upsert patterns (or something similar), this article by Michael Swart is really informative:

SQL Server UPSERT Patterns and Antipatterns

Answer #2

6 years ago Paul White (imported from SE)

Every feature has a use case, and IGNORE_DUP_KEY is no exception.
As I explain in IGNORE_DUP_KEY slower on clustered indexes, this option only potentially benefits performance (on a clustered index) when the number of duplicate key exceptions is sufficiently small.
The break-even point is system-dependant, and requires realistic testing to properly evaluate. Try not to make assumptions or judgements in advance. Test each of the robust implementation alternatives carefully, and choose the one that makes most sense in your scenario.
As an example of a reason for a performance surprise, consider that the index insertion point has to be located at some stage, so repeating that operation may be cheaper than expected. When SQL Server can use the “rowset sharing” optimization, there is even less overhead. The engine locates the insertion point once when checking for existence, and keeps that direct reference for the insert operation.
Though not directly relevant to the question, I should mention some caveats to using IGNORE_DUP_KEY:
It is not respected by the insert activity of MERGE statements.
It cannot be added to an existing index.
It is not as ‘discoverable’ as an explicit existence test.
I generally prefer separate statements to MERGE, but you should test it for your intended usage. MERGE can combine an insert and update into a single statement, while taking advantage of the hole-filling optimization and rowset sharing.

2 Answers