Finding Outdated and Missing Stats in Synapse SQL Pools

One recent realization for myself working with Synapse Dedicated SQL Pools is that while they have auto create stats, there is no auto update stats. So statistics will become stale faster than your bread left in the toaster. Rather than updating every 20% or so like traditional SQL Server. In additon, if you create an empty table, the SQL pool will just assume 1,000 rows and pretty much never update statistics. We call this a missing statistic.

The biggest problem in locating these stale or missing statistics is that we dont have access to DMVs like sys.dm_db_stats_properties to track the estimated rows versus the actual rows and there is a connect item discussing this thanks to the super smart SQL MCM Jason Horner

https://feedback.azure.com/forums/307516-azure-synapse-analytics/suggestions/33000598-sys-dm-db-stats-properties

One red herring that people go down is thinking that locating out of date statistics has anything to do with the date the statistics was created or updated and trying to use STATS_DATE function to get the date of last update. A statistics could be 10 years old and still valid if the table hasnt changed, or it could be 24 hours old but a billion rows have been inserted causing skew.

The Answer: dm_db_partition_stats

The answer lies in this sample script from Microsoft which tracks out of date statistics at the partition level using sys.dm_db_partition_stats

https://github.com/Microsoft/sql-data-warehouse-samples/blob/main/samples/sqlops/MonitoringScripts/ImpactedTables

I’ve taken that concept, cleaned it up into a view, added by partition support and the first sample is here in a view called vTableStats

https://github.com/ProdataSQL/SynapseTools/blob/main/SqlPools/Maintenance/vTableStats.sql

Demo of dbo.vTableStats

Running this view on my sample EDW in a SQL Pool I can now locate all the missing and out date statistics

SELECT *  FROM [dbo].[vTableStats]
WHERE stat_info is not null

We can see one table which was created by using a CREATE TABLE and INSERT INTO (rather than CTAS), which the statistics think has 1,000 rows but actually has over 61 million rows and over 40 other potential statistic issues.

Whats Next for Stats Maintenance

Now that we can locate out dated and missing stats (or at least tables/partitions with them), its pretty easy to write a SProc to do stats maintenance and fix the issues. Working on than now!

I also have some sample code which can locate and delete redundant or overlapping statistics which coud be useful.

Leave a Reply