This question is related to [my old question][1]. The below query was taking 10 to 15 seconds to execute:
SELECT [customer].[Customer name],[customer].[Sl_No],[customer].[Id]
FROM [company].dbo.[customer]
WHERE (Charindex('123456789',CAST([company].dbo.[customer].[Phone no] AS VARCHAR(MAX)))>0)
In some articles I saw that using `CAST` and `CHARINDEX` will not benefit from indexing. There are also some articles that say using `LIKE '%abc%'` will not benefit from indexing while `LIKE 'abc%'` will:
* http://bytes.com/topic/sql-server/answers/81467-using-charindex-vs-like-where
* https://stackoverflow.com/questions/803783/sql-server-index-any-improvement-for-like-queries
* http://www.sqlservercentral.com/Forums/Topic186262-8-1.aspx#bm186568
In my case I can rewrite the query as:
SELECT [customer].[Customer name],[customer].[Sl_No],[customer].[Id]
FROM [company].dbo.[customer]
WHERE [company].dbo.[customer].[Phone no] LIKE '%123456789%'
This query gives the same output as the previous one. I have created a nonclustered index for column `Phone no`. When I execute this query it runs in just **1 second**. This is a huge change compared with **14 seconds** previously.
How does `LIKE '%123456789%'` benefit from indexing?
Why do the listed articles state that it will not improve performance?
I tried rewriting the query to use `CHARINDEX`, but performance is still slow. Why does `CHARINDEX` not benefit from the indexing as it appears the `LIKE` query does?
Query using `CHARINDEX`:
SELECT [customer].[Customer name],[customer].[Sl_No],[customer].[Id]
FROM [Company].dbo.[customer]
WHERE ( Charindex('9000413237',[Company].dbo.[customer].[Phone no])>0 )
Execution plan:
![enter image description here][2]
Query using `LIKE`:
SELECT [customer].[Customer name],[customer].[Sl_No],[customer].[Id]
FROM [Company].dbo.[customer]
WHERE[Company].dbo.[customer].[Phone no] LIKE '%9000413237%'
Execution plan:
![LIKE query plan][3]
[1]: https://dba.stackexchange.com/questions/46835/query-taking-non-consistent-time-to-execute?noredirect=1#comment83125_46835
[2]: http://i.stack.imgur.com/7gSwu.png
[3]: http://i.stack.imgur.com/2bHFM.png
Top Answer
Paul White
> *How does LIKE '%123456789%' benefit from indexing?*
Only a little bit. The query processor can **scan** the whole nonclustered index looking for matches instead of the entire table (the clustered index). Nonclustered indexes are generally smaller than the table they are built on, so scanning the nonclustered index may be faster.
The downside, is that any columns needed by the query that are not included in the nonclustered index definition must be looked up in the base table, per row.
The optimizer makes a decision between scanning the table (clustered index) and scanning the nonclustered index with lookups, based on cost estimates. The estimated costs depend to a great extent on how many rows the optimizer **expects** your `LIKE` or `CHARINDEX` predicate to select.
> *Why do the listed articles state that it will not improve performance?*
For a `LIKE` condition that does **not** start with a wildcard, SQL Server can perform a **partial scan** of the index instead of scanning the whole thing. For example, `LIKE 'A%` can be correctly evaluated by testing only index records `>= 'A'` and `< 'B'` (the exact boundary values depend on collation).
This sort of query can use the seeking ability of b-tree indexes: we can go straight to the first record `>= 'A'` using the b-tree, then scan forward in index key order until we reach a record that fails the `< 'B'` test. Since we only need to apply the `LIKE` test to a smaller number of rows, performance is generally better.
By contrast, `LIKE '%A` cannot be turned into a partial scan because we don't know where to start or end; any record could end in `'A'`, so we cannot improve on scanning the whole index and testing every row individually.
> *I tried rewriting the query to use `CHARINDEX`, but performance is still slow. Why does `CHARINDEX` not benefit from the indexing as it appears the LIKE query does?*
The query optimizer has the **same choice** between scanning the table (clustered index) and scanning the nonclustered index (with lookups) in both cases.
The choice is made between the two based on **cost estimation**. It so happens that SQL Server may produce a different estimate for the two methods. For the `LIKE` form of the query, the estimate may be able to use special string statistics to produce a reasonably accurate estimate. The `CHARINDEX > 0` form produces an estimate based on a guess.
The different estimates are enough to make the optimizer choose a Clustered Index Scan for `CHARINDEX` and a NonClustered Index Scan with Lookups for the `LIKE`. If you force the `CHARINDEX` query to use the nonclustered index with a hint, you will get the same plan as for `LIKE`, and performance will be about the same:
SELECT
[Customer name],
[Sl_No],
[Id]
FROM dbo.customer WITH (INDEX (f))
WHERE
CHARINDEX('9000413237', [Phone no]) >0;
The number of rows processed at runtime will be the same for both methods, it's just that the `LIKE` form produces a more accurate estimation in this case, so the query optimizer chooses a better plan.
If you find yourself needing `LIKE %thing%` searches often, you might want to consider a technique I wrote about in [Trigram Wildcard String Search in SQL Server][1].
[1]: https://sqlperformance.com/2017/09/sql-performance/sql-server-trigram-wildcard-search