How to reduce the impact of mismatched collations in MS SQL Server

Does different collation affect performance for queries in SQL Server?

The short answer is yes.

The details are a bit more nuanced. For example when using datatypes like Int, datetime there is no  issue even if the collations are different. This is simply because the values are not specific to a region and therefore there is no conversion required when enumerating the value from one collation to another. 

However, the same cannot be said for text column. When using columns with datatypes like varchar, nvarchar we have to be aware of the collation since the collation dictates the order of the data in the index. In these cases, it is incorrect to assume that Latin1_General_100_CI_AI and SQL_Latin1_General_CP1_CI_AI are the same either. While the collations dictate there is no case sensitivity or accent sensitivity it doesn’t mean they are the same because SQL Collation could technically be different than Windows collations. In such cases SQL has no choice but to perform the collation check and conversion at the time of query execution.

Prove it

The below script demonstrates the test when joining two tables, one (SOURCECOLLATE) with a Windows collation Latin1_General_100_CI_AI which is being joined to another table ([PerfCounter]) with SQL collation SQL_Latin1_General_CP1_CI_AI. The difference in execution plan performance clearly shows the impact of using the different collations. The indexes are added on the counter_name column for both the table.

use madworks 
go 

 
 -- Create a table with a harded collation

CREATE TABLE  SOURCECOLLATE
(

ID  int identity(1,1) PRimARY KEY CLUSTERED,
[Counter_name] varchar(100)  COLLATE  Latin1_General_100_CI_AI NOT NULL
)
-- Insert some  data
INSERT INTO SOURCECOLLATE ([Counter_name])
SELECT DISTINCT counter_name FROM sys.dm_os_performance_counters
GO
-- Create index on Column CounterName 

CREATE NONCLUSTERED INDEX NCIX_CntName ON SOURCECOLLATE (Counter_name)

GO 
-- Create fact able for the join  

CREATE TABLE [dbo].[PerfCounter](
	[object_name] [nchar](128) NOT NULL,
	[counter_name] [nchar](128) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL ,
	[instance_name] [nchar](128) NULL,
	[cntr_value] [bigint] NOT NULL,
	[cntr_type] [int] NOT NULL
) ON [PRIMARY]
GO




-- Insert some more data  intot he fact  table  
INSERT INTO [dbo].[PerfCounter] ([object_name], [counter_name], [instance_name], [cntr_value], [cntr_type])
SELECT   [object_name], [counter_name], [instance_name], [cntr_value], [cntr_type]
FROM sys.dm_os_performance_counters
GO 100


Create NONCLUSTERED INDEX NCIX_CounterName on  [dbo].[PerfCounter]  ([counter_name])

GO 

declare @q1 datetime = getdate()
	select   a.Counter_name , b.counter_name from   [dbo].[SOURCECOLLATE] a 
	INNER JOIN  [dbo].[PerfCounter] b
	ON a.Counter_name COLLATE SQL_Latin1_General_CP1_CI_AI = b.counter_name 
	where a.Counter_name  ='Usage                                                                                                                           '
select DATEDIFF(ms,@q1 , getdate())

GO 


declare @q1 datetime = getdate()
	select  a.Counter_name , b.counter_name from   [dbo].[SOURCECOLLATE] a 
	INNER JOIN  [dbo].[PerfCounter] b
	ON a.Counter_name = b.counter_name  COLLATE Latin1_General_100_CI_AI
	where a.Counter_name ='Usage                    '
select DATEDIFF(ms,@q1 , getdate())

In the below execution plan for the query

declare @q1 datetime = getdate()
	select   a.Counter_name , b.counter_name from   [dbo].[SOURCECOLLATE] a 
	INNER JOIN  [dbo].[PerfCounter] b
	ON a.Counter_name COLLATE SQL_Latin1_General_CP1_CI_AI = b.counter_name 
	where a.Counter_name  ='Usage                                                                                                                           

We fix the collation for the primary key table to make it match the collation for the foreign key table, in other words the smaller table is converted to the collation for the larger table. This results in an index scan on the primary key table , but it’s a small table so we suffer a smaller performance hit. Notice that the foreign key table has a Index Seek.  

In the below execution plan for the query

declare @q1 datetime = getdate()
	select  a.Counter_name , b.counter_name from   [dbo].[SOURCECOLLATE] a 
	INNER JOIN  [dbo].[PerfCounter] b
	ON a.Counter_name = b.counter_name  COLLATE Latin1_General_100_CI_AI
	where a.Counter_name ='Usage

We change the collation of the foreign key table to that of the Primary key table. In other words, the data from the larger table is being converted to the collation for the smaller table. This results in the foreign key table performing an index scan since the Index becomes useless once the collation has been applied.

So, while there is an impact when collations are different there is are ways to mitigate the impact by ensuring the conversion has been applied sensibly. Always ensure the collation has been applied to the table with fewer rows.

Bonus round : – Is there a better way?

However, there is an even easier way to ensure collation changes do not impact the performance.  Normalize the data properly and use Int columns for the Primary key and Foreign key relationships. Remember the collations are not applicable for int columns. So you can still use that to perform joins and filter as needed without paying the performance penalty for mismatched collation.