Many folks build star schema data warehouses and the supporting ecosystem of Semantic Models, Business Intelligence and/or ML Analytics. This is a quick note on how to assess how you are with automation, regardless of what tools you are using: SSIS,ADF, SQL, data bricks, Synapse, or non Microsoft tools. I recently ran a twitter thread … Continue reading The need for DW Automation (1/3)
TLDR The short answer is YES – if you have >= 60 million rows, then a poorly maintained column store was 100% slower due to trim fragmentation and 64% slower due to the delta row store with some updates in our tests. Obviously this is just a small test on only 60 million rows and … Continue reading Do we need to maintain Columnstores in SQL Pools ?
One recent realization for myself working with Synapse Dedicated SQL Pools is that while they have auto create stats, there is no auto update stats. So statistics will become stale faster than your bread left in the toaster. Rather than updating every 20% or so like traditional SQL Server. In additon, if you create an … Continue reading Finding Outdated and Missing Stats in Synapse SQL Pools
I wrote this in 2010, before Clustered Column Store Indexes, SQL Pools (so rule 3 looks a bit odd), and before date data type was mainstream, but the rest of these still look quite Thou Shalt use something that exists in your query as a partition key Thou shalt not make up a surrogate partition … Continue reading Top 10 SQL Partition Commandments
Blast from the past. I wrote this is 2008 and at one point we had it made into a poster for the office rule whereby on code review we not say a word but just point to the poster. Not sure how well this stood the test of time now it is 2021 so many … Continue reading Top 10 SSIS/DW Commandments
Synapse SQL Dedidcated Pools (aka SQLDW) does not support comments in views or procs in the same was as standalone SQL. This is annoying as comments can be very useful with tracking changes and lineage of objects, especially when used in conjunction with Schema Compare tools in Visual Studio. Everyone has seen that incident where … Continue reading Synpase SQL Tip 2 – Comments in Views/Procs
When loading a Star Schema Datawarehouse it is very common to need to insert rows into the a dimension based on exceptions. For example: Inferred Members (aka early arriving facts) Unknown Members (missing data) A Typical Query may look something like this for inferred members On a SQL Dedicated pool (SQLDW) this can take 2-4 … Continue reading Synapse TSQL Tip 1 – Use Double Defensive Inserts
While a lot of projects in Azure may be using SQLDB or SQLDW (aka Dedicated Pools) we do still have a lot of customers running SQL on a VM (aka IaaS). One common task is setting up data disks. Typically we may need anything from 4-20 data disks and combine them with storage spaces to … Continue reading Adding Disks to Azure VM with Powershell
In CI/CD Azure Synapse Analytics – part 1 we have covered: Setting up git source control in Synapse Studio The difference in main collaboration and workspace publish branch In this blog we are going to cover: How to create build pipelines Deploy to Synapse production workspace using DevOps release pipelines Build Pipeline in DevOps Go … Continue reading CI/CD For Azure Synapse Analytics – Part 2
Do you also wonder how to do continuous integration (CI) and continuous deployment (CD) for Synapse Analytics? But first, lets talk about basic, what is CI/CD is simple terms. CI/CD is one of the best practices of agile methodology and it enables development teams to deploy stable and tested code changes more frequently. This practice … Continue reading CI/CD for Azure Synapse Analytics – Part 1
Now that MSFT have released Synapse as an Integrated framework of tools for Data Engineering we have two different ways to provision our enterprise data warehouse in a dedicated sql pool. Which is the best one to use ? Firstly we can use the new Integrated experience with the dedicated pool owned by Synapse Workspace. … Continue reading Standalone or Integrated flavour of your Dedicated Pool (sqldw)
Data Lake Storage to Synapse Analytics using Managed Service Identity (MSI) – COPY INTO and PolyBase
Why to use MSI? Easy to authenticate any Azure Active Directory supported service Provides limited visibility of the credentials More secured than SQL authentication and less susceptible to hacking Configuration steps Create a storage account and enable Hierarchical namespace for Data Lake Storage Gen2. Within storage account, create a container for file system . Navigate … Continue reading Data Lake Storage to Synapse Analytics using Managed Service Identity (MSI) – COPY INTO and PolyBase
Something went wrong. Please refresh the page and/or try again.
always on Azure backups Cloud consolidation Cube DW EDW ERP Events Featured Finance Financial Reporting Irish Economic Crisis Licencing Load Test Maestro Many to Many MasterClass MCM mdx Modelling MOLAP OLAP Partitioning Performance Personal Power BI PowerPivot Power View scom SQL SQLBits sqldw SQL IO SQL Saturday SQL Server SSAS SSIS storage Syanpse Synapse Tabular TSQL Virtualization
Subscribe to Prodata Blog
Get new content delivered directly to your inbox.