Substring join or additional table, which is faster? - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

Substring join or additional table, which is faster?

sql-server sql-server-2014 add tag

speedofround (imported from SE)

I have a case that come up regularly at work. I have many tables that use a 4 character string as a foreign key : `G191`

The `G` is a sort of category, the `19` is the year, and the `1` is an instance.
Getting all rows of category `G` is something we do very often. Usually like:

```
SELECT * FROM [Table] 
WHERE Left([ID], 1) = 'G'
```

There is a way to get this effect without manipulating a string, by joining to a table where this category is defined:

```
SELECT * FROM [Table]
JOIN [Categories] ON [Table].CategoryID = [Categories].CategoryID
WHERE [Categories].Letter = 'G'
```

My co worker insists that the first way is more performant, and rolls his eyes at me for doing it the second way.

Which one is better? Does joining by another table really add more work then checking the first character of a string?

Top Answer

meme (imported from SE)

Why Yours Is Better
--
In general, your pattern is a better idea. 

Performance will depend on indexes, predicate selectivity, and table size though.

The reason your pattern is a better idea comes down to the concept of SARGability, which is a fancy word for if your search arguments can be used as seek predicates, or even fully pushed predicates, (i.e. not being processed in Filter operators after accessing an index).

Some examples of where this can hurt in joins and where clauses are:

* function(column) = something
* column + column = something
* column + value = something
* column = @something or @something IS NULL
* column like ‘%something%’
* column = case when …

When you do stuff like this, your queries can end up with all sorts of bad side effects:

* Increased CPU (burn baby burn)
* Index Scans (when you could have Seeks)
* Implicit Conversion (if your predicates produce a different data type)
* Poor Cardinality Estimates (poking the optimizer in the eye)
* Inappropriate Plan Choices (because the optimizer is blind now, you jerk)
* Long Running Queries (yay job security!)

Better Options
---
SARGable options for what you're looking for would include:

`WHERE [ID] LIKE 'G%'`

or

`WHERE [ID] >= 'G' AND [ID] < 'H'`

An alternative solution would be to add a computed column in just the table you're searching:

    ALTER TABLE [Table] 
        ADD Lefty AS Left([ID], 1);
    
    CREATE INDEX ix_whatever 
        ON [Table] (CategoryID , Lefty);

Though like I said before, the performance difference may not be dramatic with smaller tables. 

It's also possible that this index won't be used since your example query is selecting all of the table columns, and this index doesn't cover them. But that's a story for a different day.

1 Answer