Its no secret that optimising distinct count in SSAS/MOLAP is painful. the normal optimisation stuff is covered in various papers such as:
– INI file settings to increase PageSize
– Separate Measure Groups
– Trying to partition by the distinct count item so that each partitions min and max values don’t cross over.
We recently did all this for a customer but noticed one annoying thing. When we fired up SQL Profiler and ran a simple query we are flooding with “Progress Report End” noise, hitting the same partition.
In the case of one customer this generated over a million profiler events even though they only had 150 or so partitions. On the blown up adventure works cube with 150 million rows this generates some 1,500 events when there are only four partitions!
I chatted with Microsoft CSS and Alex Whittles who has done some interesting benchmarks on Distinct Count (http://www.purplefrogsystems.com/blog/2014/03/analysis-services-tabular-or-multidimensional-a-performance-comparison/)
The reason for this seeming annoying flood is that the Distinct Count outputs a progress report message PER SEGMENT and it pretty much has to scan all segments. I’m cool with it having to scan all the segments (this is part of the challenge of the algorithm), but does it “really” have to give a message to profiler every 64k or so…
So far feedback from MS is that this behaviour is by design. I can’t help but think that such verbose instrumentation must hurt performance somewhere.
If you do notice this. Don’t panic – its by design. of course there are many funky ways to avoid a physical distinct count but that’s another blog!
A filed a connect item to ask if the product team can tone down the verbosity on events for Distinct Count. Please up vote if you come across this.