Synapse TSQL Tip 1 – Use Double Defensive Inserts

When loading a Star Schema Datawarehouse it is very common to need to insert rows into the a dimension based on exceptions. For example:

  • Inferred Members (aka early arriving facts)
  • Unknown Members (missing data)

A Typical Query may look something like this for inferred members

INSERT INTO DimAccount(AccountCode, AccountName, Inferred)
SELECT f.AccountCode, 'Account Missing',1
FROM staging.FactTransactions f
WHERE NOT EXISTS (SELECT * FROM DimAccount d WHERE d.AccountCode=f.AccountCode)
GROUP BY f.AccountCode

On a SQL Dedicated pool (SQLDW) this can take 2-4 seconds, whereas just the select takes only 300ms, despite the select not actually returning any rows 99.999% of the time. This is because the insert query is passed to all 60 distributions.

You can save time by using TSQL like below, with the extra IF NOT EXISTS

IF NOT EXISTS (SELECT f.AccountCode FROM staging.FactTransactions f WHERE NOT EXISTS (SELECT * FROM DimAccount d WHERE d.AccountCode=f.AccountCode) )
	INSERT INTO DimAccount(AccountCode, AccountName, Inferred)
	SELECT f.AccountCode, 'Account Missing',1
	FROM staging.FactTransactions f
	WHERE NOT EXISTS (SELECT * FROM DimAccount d WHERE d.AccountCode=f.AccountCode)
	GROUP BY f.AccountCode

Sure, its annoying to use more TSQL, but I would hope you are not hand coding this and have some fancy ETL or meta data generator to generate the template, and not doing this sort of basic stuff by hand.

It is 2021 and all that 😉

Leave a Reply