When loading a Star Schema Datawarehouse it is very common to need to insert rows into the a dimension based on exceptions. For example:
- Inferred Members (aka early arriving facts)
- Unknown Members (missing data)
A Typical Query may look something like this for inferred members
INSERT INTO DimAccount(AccountCode, AccountName, Inferred)
SELECT f.AccountCode, 'Account Missing',1
FROM staging.FactTransactions f
WHERE NOT EXISTS (SELECT * FROM DimAccount d WHERE d.AccountCode=f.AccountCode)
GROUP BY f.AccountCode
On a SQL Dedicated pool (SQLDW) this can take 2-4 seconds, whereas just the select takes only 300ms, despite the select not actually returning any rows 99.999% of the time. This is because the insert query is passed to all 60 distributions.
You can save time by using TSQL like below, with the extra IF NOT EXISTS
IF NOT EXISTS (SELECT f.AccountCode FROM staging.FactTransactions f WHERE NOT EXISTS (SELECT * FROM DimAccount d WHERE d.AccountCode=f.AccountCode) )
INSERT INTO DimAccount(AccountCode, AccountName, Inferred)
SELECT f.AccountCode, 'Account Missing',1
FROM staging.FactTransactions f
WHERE NOT EXISTS (SELECT * FROM DimAccount d WHERE d.AccountCode=f.AccountCode)
GROUP BY f.AccountCode
Sure, its annoying to use more TSQL, but I would hope you are not hand coding this and have some fancy ETL or meta data generator to generate the template, and not doing this sort of basic stuff by hand.
It is 2021 and all that 😉