Can SQL Server create collisions in system generated constraint names?

sql-server sql-server-2014 sql-server-2008 add tag

jl6 (imported from SE)

I have an application which creates millions of tables in a SQL Server 2008 database (non clustered). I am looking to upgrade to SQL Server 2014 (clustered), but am hitting an error message when under load:

>There is already an object named ‘PK__tablenameprefix__179E2ED8F259C33B’ in the database

This is a system generated constraint name. It looks like a randomly generated 64-bit number. Is it possible that I am seeing collisions due to the large number of tables? Assuming I have 100 million tables, I calculate less than a 1-in-1-trillion chance of a collision when adding the next table, but that assumes a uniform distribution. Is it possible that SQL Server changed its name generation algorithm between version 2008 and 2014 to increase the odds of collision?

The other significant difference is that my 2014 instance is a clustered pair, but I am struggling to form a hypothesis for why that would generate the above error. 

I know creating millions of tables is insane. This is black box 3rd party code over which I have no control. Despite the insanity, it worked in version 2008 and now doesn’t in version 2014. 

On closer inspection, the generated suffix always seems to start with 179E2ED8 - meaning the random part is actually only a 32-bit number and the odds of collisions are a mere 1-in-50 every time a new table is added, which is a much closer match to the error rate I’m seeing!

Top Answer

Martin Smith (imported from SE)

> Can SQL Server create collisions in system generated constraint names?

This depends on the type of constraint and version of SQL Server.

    CREATE TABLE T1
    (
    A INT PRIMARY KEY CHECK (A > 0),
    B INT DEFAULT -1 REFERENCES T1,
    C INT UNIQUE,
    CHECK (C > A)
    )
    
    SELECT name, 
           object_id, 
    	   CAST(object_id AS binary(4)) as object_id_hex,
    	   CAST(CASE WHEN object_id >= 16000057  THEN object_id -16000057 ELSE object_id +2131483591 END AS BINARY(4)) AS object_id_offset_hex
    FROM sys.objects
    WHERE parent_object_id = OBJECT_ID('T1')
    ORDER BY name;
    
    drop table T1

## Example Results 2008


|           name           | object_id | object_id_hex | object_id_offset_hex |
|--------------------------|-----------|---------------|----------------------|
| CK__T1__1D498357         | 491357015 | 0x1D498357    | 0x1C555F1E           |
| CK__T1__A__1A6D16AC      | 443356844 | 0x1A6D16AC    | 0x1978F273           |
| DF__T1__B__1B613AE5      | 459356901 | 0x1B613AE5    | 0x1A6D16AC           |
| FK__T1__B__1C555F1E      | 475356958 | 0x1C555F1E    | 0x1B613AE5           |
| PK__T1__3BD019AE15A8618F | 379356616 | 0x169C85C8    | 0x15A8618F           |
| UQ__T1__3BD019A91884CE3A | 427356787 | 0x1978F273    | 0x1884CE3A           |

## Example Results 2017

|           name           | object_id  | object_id_hex | object_id_offset_hex |
|--------------------------|------------|---------------|----------------------|
| CK__T1__59FA5E80         | 1509580416 | 0x59FA5E80    | 0x59063A47           |
| CK__T1__A__571DF1D5      | 1461580245 | 0x571DF1D5    | 0x5629CD9C           |
| DF__T1__B__5812160E      | 1477580302 | 0x5812160E    | 0x571DF1D5           |
| FK__T1__B__59063A47      | 1493580359 | 0x59063A47    | 0x5812160E           |
| PK__T1__3BD019AE0A4A6932 | 1429580131 | 0x5535A963    | 0x5441852A           |
| UQ__T1__3BD019A981F522E0 | 1445580188 | 0x5629CD9C    | 0x5535A963           |


For default constraints, check constraints and foreign key constraints the last 4 bytes of the auto generated name are a hexadecimal version of the objectid of the constraint. As `objectid` are guaranteed unique the name must also be unique. [In Sybase too][1] these use `tabname_colname_objectid`

For unique constraints and primary key constraints Sybase uses 

> tabname_colname_tabindid, where tabindid is a string concatenation of
> the table ID and index ID

This too would guarantee uniqueness. 

SQL Server doesn't use this scheme.

In both SQL Server 2008 and 2017 it uses an 8 byte string at the end of the system generated name however the algorithm has changed as to how the last 4 bytes of that are generated.

In 2008 the last 4 bytes represent a signed integer counter that is offset from the `object_id` by `-16000057` with any negative value wrapping around to max signed int. (The significance of `16000057` is that this is [the increment applied between successively created `object_id`][2]). This still guarantees uniqueness.

On 2012 upwards I don't see any pattern at all between the object_id of the constraint and the integer obtained by treating the last 8 characters of the name as the hexadecimal representation of a signed int.

The function names in the call stack in 2017 shows that it now creates a GUID as part of the name generation process (On 2008 I see no mention of `MDConstraintNameGenerator`). I guess this is to provide some source of randomness. Clearly it isn't using the whole 16 bytes from the GUID in that 4 bytes that changes between constraints however.

![enter link description here][3]

I presume the new algorithm was done for some efficiency reason at the expense of some increased possibility of collisions in extreme cases such as yours. 

This is quite a pathological case as it requires the table name prefix and column name of the PK (insofar as this affects the 8 characters preceding the final 8) to be identical for tens of thousands of tables before it becomes probable but can be reproduced quite easily with the below.

    CREATE OR ALTER PROC #P
    AS
        SET NOCOUNT ON;
    
        DECLARE @I INT = 0;
    
    
        WHILE 1 = 1
          BEGIN
              EXEC ('CREATE TABLE abcdefghijklmnopqrstuvwxyz' + @I + '(C INT PRIMARY KEY)');
              SET @I +=1;
          END 
    
    GO
    
    EXEC #P

An example run on SQL Server 2017 against a newly created database failed in just over a minute (after 50,931 tables had been created)

> Msg 2714, Level 16, State 30, Line 15 There is already an object named
> 'PK__abcdefgh__3BD019A8175067CE' in the database. Msg 1750, Level 16,
> State 1, Line 15 Could not create constraint or index. See previous
> errors.


  [1]: http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc36272.1570/html/commands/X80969.htm
  [2]: https://raresql.com/2013/01/29/sql-server-all-about-object_id/
  [3]: https://i.stack.imgur.com/hNbgX.png

Answer #2

david browne microsoft (imported from SE)

>Assuming I have 100 million tables, I calculate less than a 1-in-1-trillion chance of a collision

Remember this is the "[birthday problem][1]".  You're not trying to generate a collision for a single given hash, but rather measuring the probability that none of the many pairs of values will collide.

So with N tables, there are N*(N-1)/2 pairs, so here about 10^16^ pairs.  If the probability of a collision is 2^-64^, the probability of a single pair not colliding is 1-2^-64^, but with so many pairs, the probability of having no collisions here is about ((1-2^-64^)^10^)^16^, or more like 1/10,000.  See eg https://preshing.com/20110504/hash-collision-probabilities/

And if it's only a 32-bit hash the probability of a collision crosses 1/2 at only 77k values.


  [1]: https://en.wikipedia.org/wiki/Birthday_problem

2 Answers