SQL 2019 – Indirect Checkpoint

Checkpoint in SQL server is the process by which dirt pages are written to disk. However it had a flaw that cause issues and the solution of which is indirect checkpoint. In the beginning we had automatic checkpoints. Here we assume a recover interval of 60 secs. Then in a very crude explanation we measure the disk latency say 10 ms / write to identify the number of log records that can be processed within 60 sec considering a latency of 10ms. So 6000 log records per checkpoint run.

The problem is when one long running query modifies a large number of pages , in such cases even though the transaction log file has only 6000 log records the number of dirty pages could be 60,000. So when checkpoint runs it causes massive amount of IO and unpredictable performance. In order to circumvent this issue SQL Server has in direct checkpoint. Here we keep track of a Dirty page list (1 per database) and when this dirty page list is full we have a process similar to checkpoint writing pages to disk. Now we have threads that traverse each DPL partition and aggressively writes dirty pages to disk (basically many small writes rather than the previous one massive write).

We have one DPL partition per schedule. As a result it is possible that one query executing on one scheduler may result in a single partition of DPL having significantly large number of dirty pages than the others. Since we still have only one worker for each DPL partition background writer process this results in indirect checkpoint behaving poorly. This issue is more pronounced in tempdb where we often have large amount of DML.

In SQL 2019 this mechanism has been optimized to avoid spinlock contention error messages resulting from the above process. Previous solution was to set recovery interval of tempdb to 0 thereby switching it from indirect checkpoint to automatic checkpoint mode.

Please Consider Subscribing