Updating Stats in Azure SQL Database

Recently we were having some issues with DTU utilization in Microsoft SQL Azure. One of the troubleshooting steps that was suggested was the update statistics and rebuild indexes. This is already being done and didn’t really solve the issue because we noticed that even after we stopped updating and rebuilding statistics everything was working fine. However, this did bring up an interesting scenario that I wanted to test out and that is how do you update statistics in Azure without affecting your DTUs. The important thing to understand here is that Azure bills you based on resource utilization and therefore even an operation such as updating statistics or rebuilding indexes counts against your DTU utilization. The problem with DTU however is that it doesn’t differentiate between user queries and administrative tasks when it’s executed by the end user which in this case would be your organization.

If you’re not careful when you update statistics for a large table there’s a good chance that you will exhaust DTUs and result in the database coming down until your DTU quota has been refreshed. This is important because from SQL 2016 onwards updating statistics can be a multi-threaded operation based on the threshold. If you’re not careful Azure might end up using excessive number of CPUs to query the large table and process its statistics. This will result in the DTU being exhausted and a spike of 100%. The way to overcome this is to SET MAXDOP =1 one to ensure that you have additional CPU’s available for user queries. Also, you need to ensure that you do these operations during a maintenance window where less users are connected and at the same time ensure that you don’t do full scan or resample too frequently as both of these will automatically result in extra CPU’s being assigned due to parallelism.

The below query can help identify the stale statistics

-- query to identify stats that need to be rebuilt by date
-- Copy paste the Output and execute as needed
select Distinct Query from (
SELECT top 1000 sp.stats_id, 
       stat.name, 
       filter_definition, 
       last_updated, 
       rows, 
       rows_sampled, 
       steps, 
       unfiltered_rows, 
       modification_counter,
	   'Update Statistics ['+SCHEMA_NAME(so.schema_id) +'].['+object_name(stat.Object_id) +'] ['+stat.name+ '] WITH FULLSCAN , MAXDOP =1' as Query
FROM sys.stats AS stat
     CROSS APPLY sys.dm_db_stats_properties(stat.object_id, stat.stats_id) AS sp
	 INNER JOIN sys.objects so on so.object_id =stat.object_id
WHERE stat.object_id in (select  object_id from sys.tables) 
and last_updated < getdate()-10 
order by modification_counter desc
) a 

Notice the CPU time for single threaded vs Multithreaded in the screenshots below.