While conducting a training recently I was asked which is better Count(*) or Count(id) , while we wanted to discuss the topic further we didn’t have time so I promised to write a blog about it.
Coming right down to it, in almost all cases Count(*) is better. Typically you won’t be able to tell the difference for small tables but once the tables grow large you can see the difference in the execution plan as well as the time statistics. Take for example a table called tickets which I will use below to demonstrate.
Tickets table contains 16 million rows.
Without any indexes when executing the below queries with a clean buffer pool
SELECT count(Scheduleid) FROM [dbo].Tickets OPTION (MAXDOP 1) CPU TIME = 2777 ms ,elapsed TIME = 8748 ms. SELECT count(*) FROM [dbo].Tickets OPTION (MAXDOP 1) CPU TIME = 1014 ms ,elapsed TIME = 1009 ms.
Without any indexes when executing the below queries with a prepopulated buffer pool
SELECT count(Scheduleid) FROM [dbo].Tickets OPTION (MAXDOP 1) CPU TIME = 2465 ms ,elapsed TIME = 2544 ms. SELECT count(*) FROM [dbo].Tickets OPTION (MAXDOP 1) CPU TIME = 1061 ms ,elapsed TIME = 1052 ms.
Now let’s look at the execution plan
SELECT count(Scheduleid) FROM [dbo].Tickets OPTION (MAXDOP 1)
SELECT count(*) FROM [dbo].Tickets OPTION (MAXDOP 1)
Notice from the above data how CPU time is almost always the same as elapsed time when using Count(*) and that it is always less than the option used for Count(id)
Once looking that the execution plan you will notice that when using count(*) that size of the row is always 9 bytes but when using a column it depends on the column datatype. Naturally at this point we can assume that the count is being done in two different ways depending on the query so the question is from how?
Now notice they change in behaviour once I make the scheduleid column not null within the tickets table.
SELECT count(Scheduleid) FROM [dbo].Tickets OPTION (MAXDOP 1) CPU TIME = 1030 ms ,elapsed TIME = 1041 ms. SELECT count(*) FROM [dbo].Tickets OPTION (MAXDOP 1) CPU TIME = 1014 ms ,elapsed TIME = 1023 ms.
The execution time becomes almost the same. The reason for this is because when using Count(*) the slot count is used within each page to determine the number of rows. But when using an individual column as the expression on which count happens SQL is forced to scan the column entirely to check for nulls even when the column doesn’t have nulls, it’s only after scanning the entire column that SQL truly knows. Hence the warning
Warning: Null value is eliminated by an aggregate or other SET operation.
Hope this clears things up a bit.
Please Consider Subscribing
