Now that MSFT have released Synapse as an Integrated framework of tools for Data Engineering we have two different ways to provision our enterprise data warehouse in a dedicated sql pool.
Which is the best one to use ?
- Firstly we can use the new Integrated experience with the dedicated pool owned by Synapse Workspace. In this experience we connect to a synapse SQL End point [workspace].sql.azuresynapse.net“and all management that was performed at the SQL Server level is now performed at the workspace level (Firewall, security, managed identity, etc)
- Secondly we have the classic standalone experience under a SQL Server object within azure, potential shared with other databases that are SQL DB or SQL DW (dedicated pools). In this experience we connect to the SQL End Point [server].database.windows.net, and all mangement such as firewall, AAD Admin, logging, security is performned via the server object.
You can migrate between both types with a simple create from backup, albeit having to be careful that any SQL logins in the master dont get orphaned.
Whats the difference ? Which ones for me ?
You can connect both experiences to SSMS, data studio or even use them as linked services in ADF or Synapse Workspace. When you do this there is no feature difference inside the engine in terms of TSQL and features. You wont really notice anything other than a different DNS suffix name to connect to.
(azuresynapse.net v database.windows.net)
The differences are more subtle:
- The old SQL Server object was really aimed at SQL DB and not dedicated pools, so you would see menu options such as DTUs, elastic pools, managed backups, failover groups and items that dont apply to a dedicted pool (sqldw)
- The standalone version is not automatically AAD integrated, although you can easily do this yourself.
- The Integrated version automatically configures Managed Identity for the analytical workspace (ADF) to connect to the dedicated pool and visa versa as required for say PolyBase and COPY INTO. The standalone version needs a bit of configuration to get there.
- REST and powershell APIs dont currently return a dedicated pool as a “SQL Database” query. This means for example that the Azure mobile App that I was using to pause/resume doesnt work yet for integrated mode.
(I assume MSFT is working on this and it will update eventually)
Not really a lot of diffferences. So I guess the choice is quite simple:
- If you dont want to use pipelines, spark, notebooks then standalone doesnt have these features.
- If you have lots of SQL DBs and want to manage your SQL DW like a SQL DB and see it along with your other databases then standalone is for you.
- If you have some 3rd party tool connecting to a dedicated pool (DevOps, monitoring, etc), then you may want to check compatibility with integrated mode. This API change certainly affected out ADF deployment pipelines.
- If you want an integrated experience with pipelines, spark, data bricks then Integrated mode is for you.
Other than that Integrated Mode is the goto option for new projects, its clear that the standalone mode is just really for backwards compatibility. You do get a more integrated experience as you work with other tools in the Synapse Workspace stack.
Another difference (as of 01/02/2021) is that reserved pricing is not yet in place on the integrated experience (Thanks Luke!)
“You can purchase reserved capacity for your Dedicated SQL pool (formerly SQL DW) resource, which is generally available. Reserved capacity pricing is currently not available for the new unified workspace experience in Azure Synapse Analytics.”
As a SQL Server professional, loosing the SQL Server object was a bit of a shock and did make me a bit nervous as to the implied de-emphasis on the relational engine I love. The words Tail wagging the Dog did pop into my head, as for many of our projects the SQL Pool is 90% of the work, especially if using ELT automation engines.
I had thought that maybe SQL DB and SQL DW skus would get merged somehow into a “scale up or scale out model” on a super future roadmap and mangement integration would focus more on this aspect to support DBAs working woth both.
OId school DBAs that don’t want to see or touch spark, Data bricks, pipelines, serverless, etc probably wont like this integrated mode, but many DBA roles are evolving to cover mangement of these objects too.
Onwards and upwards, when you are working in the wider stack of pipelines, spark, data bricks it does make a lot of sense to manage them in an Integrated Way.
All our new projects are using the new Integrated mode 😉