I use the term “Checkpoint IO storm” to refer to issues caused by the checkpoint process “flooding” the disk subsystem. This is often diagnosed on OLTP systems as periodic high latency writes which occur at very regular intervals.
The basic problem is that SQL Server does not write data changes write to disk immediately (the MDF file), it will continue to only write to the log file and write dirty pages to the in memory buffer pool. Every now and then it will then flush the dirty pages to disk by the checkpoint process based on how long it thinks it would take to recover if the server crashed (the recovery interval).
For many customers this process goes un-noticed for other customers they suffer a very noticeable drop in service quality.
How do you Know you have a Checkpoint Issue ?
Well high latency periodically is a warning sign but the sure sign is when you can correlate the high latency to the checkpoint process using these performance counters:
– Buffer Manager\CheckPoint Pages/Sec
– Disk Secs / Write
– Disk Writes/Sec
As a test I ran a load test on a sample OLTP doing ecommerce transactions on the C: drive of my laptop and this is what they look like. You can see that my workload does about 105 IOPS and disk latency is usually pretty great (under 1ms as its SSD), but when the Checkpoint pages/sec kicks in the IOPS go up to 5,000 or so and disk latency goes up to 20ms. My critical inserts will now get 20ms of WRITELOG waits !!
Paul Randal goes into some detail in the checkpoint process on his Immersion course aimed at MCM candidates and also has a blog article on monitoring checkpoints further than the perfmon counters
How do you solve checkpoint issues
There are a few tools in your arsenal.
a) Reconfigure the disk subsystem to handle the burst smoother (if you can get past the SAN mafia).
b) Set the recovery interval on your instance This is roughly the number of minutes of data SQL Server will keep dirty before flushing. If you lower this the checkpoints will be more frequent. http://msdn.microsoft.com/en-us/library/ms191154.aspx
c) Make the checkpoint flush over a longer time to regulate the IO using the checkpoint command and the [checkpoint duration] flag. See http://msdn.microsoft.com/en-us/library/ms188748.aspx
Lets See this [checkpoint duration] thing in action
You need to be careful as if you specify a checkpoint duration too low, this will cause spikes and if you specify a period too long, SQL server will not be writing as fast as the pages are getting dirty so will need to do another unregulated checkpoint after the manual one!
As an example I ran the script below while a OLTP load was applied.
A bit better but not perfect ;-( you can see that the 5,10 and 20 second checkpoints for this particular workload still get a small latency spike (the red line). Not as bad as the 20ms, but 4-10ms all the same, not great for log writes.
I then tried the following script
Much better we can see that peak IOPS have gone down from 9,000 during the “IO storm” to under 3,000 and latency barely kisses 1ms. The SAN guys can buy us a Guinness…
So why don’t I go away and mess with this right now?
Two reasons. Firstly you need to know the workload super well to start messing with the checkpoint. The default settings are kind of self regulating, so can handle when the workload changes.
Secondly, this issue can more often be fixed by basic disk best practices such as not placing the data and log on the same spindles or LUN, or if the server has two HBA’s that are not muti-path, placing the data on one and the log on another,or if a big data load considering minimal logging.
One issue I have seen a few times is if the SAN runs out of write cache half way through the checkpoint. You can be in a world of pain as SQL Server will be pushing at potentially tens of thousands of IOPS to the cache and suddenly boom – the SAN can’t handle it as it runs out of cache and all hell breaks loose.