In a previous blog I talked about the need for DW Automation in order to achieve that holy grail of Agility, Extensibility and perhaps Robustness with our data architecture. This blog is now focusing on some different levels of maturity. What’s different about this maturity model and other ones is that its not necessary better to go up the model it all depends on how many DW solutions you build and how much you want to spend on tooling.
I’ve been involved in building DW Automation platforms and reviewing solutions for the last 20+ years, starting with DTS and then SSIS, then ADF and now Synapse SQL Pools. What I’ve noticed is that most consultancy practises builidng (E)DW’s can be fitted into the levels below
- Level 0 – Basic. Aka “Bunch of monkeys”
- Level 1 – Common Patterns/Templates and Methodology.
- Level 2 – Automated or Parmeterised Templates
- Level 3 – Meta data and formal enterprise features
- Level 4 – Dedicated IP and souce control framework
- Level 5 – Dedicated IP – Improved scope and documentation
- Level 6 – Full DWA Product. dedicated engineers.
To be honest I have never made it past level 4 and 3 in a lot of cases, but I try to get off level 1 and 2 as soon as a I can, and run away from level 0.
Level 0 – Basic. Aka “Bunch of monkeys”
All coding is bespoke by different developers or contractors. Some common patterns and agreement on languages and Architecture, but each developer is rally a silo.
Common for a once off project using contractors without a tight methodology or tooling. Say a naïve project manger or customer may just put a budget out to “hire me 10 developers”.
Agility can be slow and may be difficult to extend if implemented in spaghetti fashion.
Technical debt mounts up at quite a pace and a less mature consulting firm will rub there hands in glee as they throw more resources onto the fire.
Developers are constantly re-inventing the wheel, progress is slow and change is almost impossible across more than one small part of the solution.
Level 1 – Common Patterns/Templates and Methodology
A more mature team who does repeat projects. Can run into technical debt issues and inability to refactor as templates evolve. Sometimes start afresh and improve as move customer to customer or project to customer. Each project becomes a Greenfields “phoenix” learning from last project as the team gains experience.
At this point the DWA is more a methodology than a framework, but there is a much better chance that 2 data engineers will be following the same standards and arrive at same end result in the same way.
Any code sharing may be largely copy/paste, with technologies like ADF offering better ways to share or parameterise.
Level 2 – Automated or Parameterised Templates
The design patterns or templates are either highly parameterised or code generated so the developer is not doing as much copy and pasting and conformance increases.
Orchestration is typically manual via a control flow package like SSIS, ADF or Apache Airflow to sequence steps.
At this point productivity increases, especially on areas which are easy to automate such as Extract into Staging or Ingest into data lake. The team may be thinking of formalising or increasing scope of automation from just extract into other areas such as transform and load. They may also have started to automate one or more enterprise features such as:
- Error handling
Level 3 – Meta data driven solution with wider scope
The parameterised templates may become meta data driven and include a wider scope of the lifecycle. Basic parameterisation becomes too complex with processes and a move to static or dyanmic code generation. Be that TSQL, dynamic Json, Python, dot.net or anything between. Enterprise features become standardised and almost no developers are writing “fat”.
At this point the team is thinking about automating any task which is repetitive and deterministic. Going beyond the scope of just simple automation of table copy and deeper into the lifecycle.
Experienced data rngineers are truly productive, but the automation IP is very “rough” with different customers having different code bases of automation engines.
Only the author and his team “really” use the framework and it can still be customer specific, so its not really a product. More a loose IP framework.
No, if you have a meta data driven parameterised ADF package doing simple data movement like staging you are not at maturity level 3. We need to see somthing meatier like enterprise features, load into star schema and some common transforms automated.
Level 4 – Formal DWA Framework IP
Formal template-based approach to some aspects of ETL or modelling. Perhaps codes generation or simple repository. The key difference is that templates are now formal IP, perhaps in source control rather than “copied” project to project. The templates and IP now have a “golden” build used purely for testing and not related to any one customer or project. Perhaps on adventureworks or some public data set.
The team is now thinking more about scaling out and helping future projects get started. The DW automation IP now has its own soure code (not one per customer). A demo and unit test environment just for framework may exist.
However, the maintenance of this IP can be hap hazard and rely on the good nature and spare time of senior project consultants.
Level 5 – Dedicated IP – Improved scope and documentation
Automated meta data-based workflow solution. The templates are stored formally in a repository and dynamically generated using a meta data configuration database. The framework now has quite a wide scope beyond just extract. It may even include DevOps and more extensive documentation with samples to help onbaord new users.
The team/company is thinking very much more of treating the automation like a product than a framework. At this point is may become possible for some implementation to easily fall to non-data engineers such as data analysts, and new projects happen faster. More focus is on data and less on coding.
Level 6 – Full DWA Product
The implementation is much more sophisticated than the loose IP a consultancy firm has (step 2+), with: a GUI interface to modify meta data, automated testing, Installation product, Web site with examples and documentation .
The team that has invested a lot of R&D on DWA automaton. Usually with a view to major re-use or actually selling the IP. Substantial effort in documentation, testing and marketing of the framework
The DWA now had almost dedicated resoures and project management.
A Warning to Consultancy Firms
If you have developed your own DWA scripts or framework you may think “This is fantastic”, “I can make millions selling it as a product”.
My advice, dont do it, run a mile.
The sheer effort and cost to go from loose IP to profitable product is not to be sniffed at, especially if you do not have dedicated resources and funding.
I see so many times (including with my company) this becoming a back burner project than never really gets finished as other work is bringing home the bacon so to speak.
sure, release it as open source, but shy away from a full product unless you are willing to quit your day job, get no income and be a startup in an Industry that already has players.
The usual disclaimer, this is 100% just my opnion and experience. Your milage may vary.