or
James
sql-server
I've noticed some weird behavior in SQL Server, a case insensitive column can group two strings together:

    if object_id('Tempdb..#names') is not null begin drop table #names drop table #sheep end

    create table #names (fct_name varchar(255))
    insert into #names (fct_name)
    values 
    ('Angus McFife'),
    ('Angus Mcfife'),
    ('Willie McNulty')

    create table #sheep (farmer varchar(255), sheep int)
    insert into #sheep (farmer, sheep)
    values
    ('Angus McFife', 35),
    ('Angus Mcfife', 16),
    ('Willie McNulty', 52)


    select fct_name, sum(s.sheep) as herd
    from #names n
    inner join #sheep s on n.fct_name = s.farmer
    group by n.fct_name
      
This is causing a problem, because Angus will now get double counted:

    fct_name	herd
    Angus McFife	102
    Willie McNulty	52
    
Angus only has 35 sheep, but our report will show he has 102! Or maybe he actually has 51. Maybe 16? Are there two farmers called Angus? Who can say? Basically, the entire agricultural industry of Scotland will grind to a halt.

In our source system (an oracle DB for a surgery application) there are no ID's for these columns - they join on names, but they are case sensitive. I'd prefer not to make the SQL Server data warehouse case sensitive if I can avoid it - especially making just one column case sensitive could cause a lot of confusion down the road.

**QUESTION:** What is the best way to ensure that Angus doesn't get double counted? 
Top Answer
Max
You could modify either the entire database to use a case-sensitive collation, or just the single column involved.  Probably easiest to modify just the column involved, but if case sensitivity is a concern, probably safest to modify the entire database.

Check the following example:

```
DROP TABLE #sheep;
DROP TABLE #names;
create table #names (fct_name varchar(255) COLLATE SQL_Latin1_General_CP1_CS_AS)
insert into #names (fct_name)
values 
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')

create table #sheep (farmer varchar(255) COLLATE SQL_Latin1_General_CP1_CS_AS, sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Willie McNulty', 52),
('Angus Mcfife', 16)

select fct_name, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer 
group by n.fct_name;
```

The results show:

```
╔════════════════╦══════╗
║    fct_name    ║ herd ║
╠════════════════╬══════╣
║ Angus McFife   ║   35 ║
║ Angus Mcfife   ║   16 ║
║ Willie McNulty ║   52 ║
╚════════════════╩══════╝
```

As you can see, the two Angus' have discrete rows. 

However, since you don't want to modify the table structures, you could instead modify the query to be case-sensitive, like this:

```
DROP TABLE #sheep;
DROP TABLE #names;
create table #names (fct_name varchar(255))
insert into #names (fct_name)
values 
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')

create table #sheep (farmer varchar(255), sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Willie McNulty', 52),
('Angus Mcfife', 16)

select fct_name COLLATE SQL_Latin1_General_CP1_CS_AS, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer COLLATE SQL_Latin1_General_CP1_CS_AS
group by n.fct_name COLLATE SQL_Latin1_General_CP1_CS_AS
```

Output:

```

╔══════════════════╦══════╗
║ (No column name) ║ herd ║
╠══════════════════╬══════╣
║ Angus McFife     ║   35 ║
║ Angus Mcfife     ║   16 ║
║ Willie McNulty   ║   52 ║
╚══════════════════╩══════╝
```
How to prioritize rows with upper case?
James
yea, my biggest worry would be if some analyst types where fct_name like '%fife%'
Max
Case-sensitivity, or more precisely, the lack-thereof, can lead to many unseen problems just like the ones you show in your question.   Modifying the column to be case sensitive means none of the code needs to change, unless you have something relying on the column being case insensitive.
James
oof, i guess making the column case sensitive might be the way to go. I'm just worried the report writers might not notice, or that we will have to throw in a case insensitive thing later on