How to reduce the impact of mismatched collations in MS SQL Server

Does different collation affect performance for queries in SQL Server?

The short answer is yes.

The details are a bit more nuanced. For example when using datatypes like Int, datetime there is no  issue even if the collations are different. This is simply because the values are not specific to a region and therefore there is no conversion required when enumerating the value from one collation to another. 

However, the same cannot be said for text column. When using columns with datatypes like varchar, nvarchar we have to be aware of the collation since the collation dictates the order of the data in the index. In these cases, it is incorrect to assume that Latin1_General_100_CI_AI and SQL_Latin1_General_CP1_CI_AI are the same either. While the collations dictate there is no case sensitivity or accent sensitivity it doesn’t mean they are the same because SQL Collation could technically be different than Windows collations. In such cases SQL has no choice but to perform the collation check and conversion at the time of query execution.

Prove it

The below script demonstrates the test when joining two tables, one (SOURCECOLLATE) with a Windows collation Latin1_General_100_CI_AI which is being joined to another table ([PerfCounter]) with SQL collation SQL_Latin1_General_CP1_CI_AI. The difference in execution plan performance clearly shows the impact of using the different collations. The indexes are added on the counter_name column for both the table.

-- Use the target database
USE madworks;
GO

/******************************************************************************************
-- Demo: Collation Handling and Performance in SQL Server
-- This script demonstrates:
--   1. Creating tables with explicit collations
--   2. Populating tables with system performance counter data
--   3. Creating indexes for performance
--   4. Comparing join performance with different collation handling
******************************************************************************************/

-- 1. Create a source table with a specific collation for the [Counter_name] column
IF OBJECT_ID('dbo.SOURCECOLLATE', 'U') IS NOT NULL DROP TABLE dbo.SOURCECOLLATE;
CREATE TABLE dbo.SOURCECOLLATE (
    ID INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
    [Counter_name] VARCHAR(100) COLLATE Latin1_General_100_CI_AI NOT NULL
);
GO

-- 2. Insert distinct counter names from system DMV into the source table
INSERT INTO dbo.SOURCECOLLATE ([Counter_name])
SELECT DISTINCT counter_name
FROM sys.dm_os_performance_counters;
GO

-- 3. Create a nonclustered index on [Counter_name] for faster lookups
CREATE NONCLUSTERED INDEX NCIX_CntName ON dbo.SOURCECOLLATE ([Counter_name]);
GO

-- 4. Create a fact table with a different collation on [counter_name]
IF OBJECT_ID('dbo.PerfCounter', 'U') IS NOT NULL DROP TABLE dbo.PerfCounter;
CREATE TABLE dbo.PerfCounter (
    [object_name]    NCHAR(128) NOT NULL,
    [counter_name]   NCHAR(128) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL,
    [instance_name]  NCHAR(128) NULL,
    [cntr_value]     BIGINT NOT NULL,
    [cntr_type]      INT NOT NULL
) ON [PRIMARY];
GO

-- 5. Insert data into the fact table from the system DMV
--    The GO 100 batch will insert the data 100 times for demo purposes
INSERT INTO dbo.PerfCounter ([object_name], [counter_name], [instance_name], [cntr_value], [cntr_type])
SELECT [object_name], [counter_name], [instance_name], [cntr_value], [cntr_type]
FROM sys.dm_os_performance_counters;
GO 100

-- 6. Create a nonclustered index on [counter_name] in the fact table
CREATE NONCLUSTERED INDEX NCIX_CounterName ON dbo.PerfCounter ([counter_name]);
GO

/******************************************************************************************
-- 7. Performance Test: Join with explicit collation conversion on SOURCECOLLATE
******************************************************************************************/
DECLARE @q1 DATETIME = GETDATE();

SELECT a.Counter_name, b.counter_name
FROM dbo.SOURCECOLLATE a
INNER JOIN dbo.PerfCounter b
    ON a.Counter_name COLLATE SQL_Latin1_General_CP1_CI_AI = b.counter_name
WHERE a.Counter_name = 'Usage                                                                                                                           ';

SELECT DATEDIFF(ms, @q1, GETDATE()) AS [Elapsed_ms];
GO

/******************************************************************************************
-- 8. Performance Test: Join with explicit collation conversion on PerfCounter
******************************************************************************************/
DECLARE @q1 DATETIME = GETDATE();

SELECT a.Counter_name, b.counter_name
FROM dbo.SOURCECOLLATE a
INNER JOIN dbo.PerfCounter b
    ON a.Counter_name = b.counter_name COLLATE Latin1_General_100_CI_AI
WHERE a.Counter_name = 'Usage                                                                                                                           ';

SELECT DATEDIFF(ms, @q1, GETDATE()) AS [Elapsed_ms];
GO

/******************************************************************************************
-- Notes:
-- - Always specify collations explicitly when joining columns with different collations.
-- - Use appropriate data types (prefer NVARCHAR over NCHAR unless fixed length is required).
-- - Create indexes on join columns for better performance.
-- - Clean up demo tables after use if needed.
******************************************************************************************/
                                                                                                               

We fix the collation for the primary key table to make it match the collation for the foreign key table, in other words the smaller table is converted to the collation for the larger table. This results in an index scan on the primary key table , but it’s a small table so we suffer a smaller performance hit. Notice that the foreign key table has a Index Seek.  

In the below execution plan for the query


We change the collation of the foreign key table to that of the Primary key table. In other words, the data from the larger table is being converted to the collation for the smaller table. This results in the foreign key table performing an index scan since the Index becomes useless once the collation has been applied.

So, while there is an impact when collations are different there is are ways to mitigate the impact by ensuring the conversion has been applied sensibly. Always ensure the collation has been applied to the table with fewer rows.

Bonus round : – Is there a better way?

However, there is an even easier way to ensure collation changes do not impact the performance.  Normalize the data properly and use Int columns for the Primary key and Foreign key relationships. Remember the collations are not applicable for int columns. So you can still use that to perform joins and filter as needed without paying the performance penalty for mismatched collation.