How to prioritize rows with upper case? - Databases

TopAnswers Databases

Meta

Databases

TeX

Code Golf

APL

C++

.net

db<>fiddle

Java

*nix

PHP

PowerShell

Python

Rust

टेक्-मराठी

Typst

Web Client Dev

Web Server Dev

How to prioritize rows with upper case?

sql-server add tag

James

I've noticed some weird behavior in SQL Server, a case insensitive column can group two strings together:

    if object_id('Tempdb..#names') is not null begin drop table #names drop table #sheep end

    create table #names (fct_name varchar(255))
    insert into #names (fct_name)
    values 
    ('Angus McFife'),
    ('Angus Mcfife'),
    ('Willie McNulty')

    create table #sheep (farmer varchar(255), sheep int)
    insert into #sheep (farmer, sheep)
    values
    ('Angus McFife', 35),
    ('Angus Mcfife', 16),
    ('Willie McNulty', 52)


    select fct_name, sum(s.sheep) as herd
    from #names n
    inner join #sheep s on n.fct_name = s.farmer
    group by n.fct_name
      
This is causing a problem, because Angus will now get double counted:

    fct_name	herd
    Angus McFife	102
    Willie McNulty	52
    
Angus only has 35 sheep, but our report will show he has 102! Or maybe he actually has 51. Maybe 16? Are there two farmers called Angus? Who can say? Basically, the entire agricultural industry of Scotland will grind to a halt.

In our source system (an oracle DB for a surgery application) there are no ID's for these columns - they join on names, but they are case sensitive. I'd prefer not to make the SQL Server data warehouse case sensitive if I can avoid it - especially making just one column case sensitive could cause a lot of confusion down the road.

**QUESTION:** What is the best way to ensure that Angus doesn't get double counted?

Top Answer

Hannah Vernon

You could modify either the entire database to use a case-sensitive collation, or just the single column involved.  Probably easiest to modify just the column involved, but if case sensitivity is a concern, probably safest to modify the entire database.

Check the following example:

```
DROP TABLE #sheep;
DROP TABLE #names;
create table #names (fct_name varchar(255) COLLATE SQL_Latin1_General_CP1_CS_AS)
insert into #names (fct_name)
values 
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')

create table #sheep (farmer varchar(255) COLLATE SQL_Latin1_General_CP1_CS_AS, sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Willie McNulty', 52),
('Angus Mcfife', 16)

select fct_name, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer 
group by n.fct_name;
```

The results show:

```
╔════════════════╦══════╗
║    fct_name    ║ herd ║
╠════════════════╬══════╣
║ Angus McFife   ║   35 ║
║ Angus Mcfife   ║   16 ║
║ Willie McNulty ║   52 ║
╚════════════════╩══════╝
```

As you can see, the two Angus' have discrete rows. 

However, since you don't want to modify the table structures, you could instead modify the query to be case-sensitive, like this:

```
DROP TABLE #sheep;
DROP TABLE #names;
create table #names (fct_name varchar(255))
insert into #names (fct_name)
values 
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')

create table #sheep (farmer varchar(255), sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Willie McNulty', 52),
('Angus Mcfife', 16)

select fct_name COLLATE SQL_Latin1_General_CP1_CS_AS, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer COLLATE SQL_Latin1_General_CP1_CS_AS
group by n.fct_name COLLATE SQL_Latin1_General_CP1_CS_AS
```

Output:

```

╔══════════════════╦══════╗
║ (No column name) ║ herd ║
╠══════════════════╬══════╣
║ Angus McFife     ║   35 ║
║ Angus Mcfife     ║   16 ║
║ Willie McNulty   ║   52 ║
╚══════════════════╩══════╝
```

1 Answer