James
I've noticed some weird behavior in SQL Server, a case insensitive column can group two strings together:
if object_id('Tempdb..#names') is not null begin drop table #names drop table #sheep end
create table #names (fct_name varchar(255))
insert into #names (fct_name)
values
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')
create table #sheep (farmer varchar(255), sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Angus Mcfife', 16),
('Willie McNulty', 52)
select fct_name, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer
group by n.fct_name
This is causing a problem, because Angus will now get double counted:
fct_name herd
Angus McFife 102
Willie McNulty 52
Angus only has 35 sheep, but our report will show he has 102! Or maybe he actually has 51. Maybe 16? Are there two farmers called Angus? Who can say? Basically, the entire agricultural industry of Scotland will grind to a halt.
In our source system (an oracle DB for a surgery application) there are no ID's for these columns - they join on names, but they are case sensitive. I'd prefer not to make the SQL Server data warehouse case sensitive if I can avoid it - especially making just one column case sensitive could cause a lot of confusion down the road.
**QUESTION:** What is the best way to ensure that Angus doesn't get double counted?
Top Answer
Hannah Vernon
You could modify either the entire database to use a case-sensitive collation, or just the single column involved. Probably easiest to modify just the column involved, but if case sensitivity is a concern, probably safest to modify the entire database.
Check the following example:
```
DROP TABLE #sheep;
DROP TABLE #names;
create table #names (fct_name varchar(255) COLLATE SQL_Latin1_General_CP1_CS_AS)
insert into #names (fct_name)
values
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')
create table #sheep (farmer varchar(255) COLLATE SQL_Latin1_General_CP1_CS_AS, sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Willie McNulty', 52),
('Angus Mcfife', 16)
select fct_name, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer
group by n.fct_name;
```
The results show:
```
╔════════════════╦══════╗
║ fct_name ║ herd ║
╠════════════════╬══════╣
║ Angus McFife ║ 35 ║
║ Angus Mcfife ║ 16 ║
║ Willie McNulty ║ 52 ║
╚════════════════╩══════╝
```
As you can see, the two Angus' have discrete rows.
However, since you don't want to modify the table structures, you could instead modify the query to be case-sensitive, like this:
```
DROP TABLE #sheep;
DROP TABLE #names;
create table #names (fct_name varchar(255))
insert into #names (fct_name)
values
('Angus McFife'),
('Angus Mcfife'),
('Willie McNulty')
create table #sheep (farmer varchar(255), sheep int)
insert into #sheep (farmer, sheep)
values
('Angus McFife', 35),
('Willie McNulty', 52),
('Angus Mcfife', 16)
select fct_name COLLATE SQL_Latin1_General_CP1_CS_AS, sum(s.sheep) as herd
from #names n
inner join #sheep s on n.fct_name = s.farmer COLLATE SQL_Latin1_General_CP1_CS_AS
group by n.fct_name COLLATE SQL_Latin1_General_CP1_CS_AS
```
Output:
```
╔══════════════════╦══════╗
║ (No column name) ║ herd ║
╠══════════════════╬══════╣
║ Angus McFife ║ 35 ║
║ Angus Mcfife ║ 16 ║
║ Willie McNulty ║ 52 ║
╚══════════════════╩══════╝
```