Saturday, October 6, 2012

Data Vault "in the trenches" track

"No theory survives contact with reality"


On the 21, 22 and 29th of November the MATTER program will start with its first course of the Data Vault "in the trenches" track " Advanced Data Vault Modeling". Since Data Vault and related techniques lend themselves quite well to automation we wanted to offer deeper insights into these techniques in relation to modeling and automation. As a prelude to this first course I'll blog about the track and its contents.

The Data Vault "in the trenches" while part of the MATTER program actually predates it. We already developed this track and decided it would be good to add and embed this in a broader program which in turn became the MATTER program. The goal of this track is to not focus directly on the standards but on the places where we see the real Data Vault casualties; in the trenches. This also meant we would be focusing our courses on BI veterans who already had real life Data Vault experience. The reason for this is that we still saw Pyrrhic victories and actual defeats when Data Vault was chosen as part of BI initiatives, mainly due to a lack of real world experience in all aspects of data (Vault). Also, while experienced Data Vault specialists are available to keep those initiatives on track, they can't be everywhere, and not all projects have them aboard. Since complex Data warehouse initiatives do not become simple ones when employing Data Vault, but actually get more complex because of the lack of serious Data Vault implementation experience we decided to help out by developing the Data Vault track now part of the MATTER program.

The Structure

The Data Vault "in the trenches" track is composed of 3 courses:
  1. Data Vault Modeling
  2. Data Vault and Data warehouse Architecture and ETL
  3. Advanced Data Vault
We had several proposals on how many courses we should have. There is a difficult trade-off between time and costs vs. content. In the end we decided we would take 2 days for modeling and 3 days for the rest (Architecture ETL, DBMS etc.). If there was anything left, we'd make an advanced course to take on those subjects. The agendas of the courses are still growing as we speak, so there is no lack of material, but more an issue with prioritization. The Advanced course is now set on 2 days.

In the trenches

When designing the track we evaluated the usefulness of offering also introductory courses. We came to the following evaluation: Standard introductory training is not the best way to start a large Data vault implementation on your own. Real world experience will help, but there is bit of a catch 22 when you try this on your own. Real world experience in other Anchor Style Modeling techniques like Anchor Modeling or (generic) temporal data modeling would help, but training and experience in those techniques is even rarer.
Controlled teaching and implementation seems to work well, but not everyone has such an opportunity.
If we would offer basic training and implementation this would seriously extend the program and require additional investments.

For those that are really new to Data Vault we advise either self-study with the Data Vault book and doing some serious prototyping first, or enrolling in (standard) Data Vault courses, either on the web or in a classroom. A very good way of learning the basics of Data Vault is to do a combination of Data Vault teaching/training and implementation on your own Data warehouse ("on the job") with external coaching/supervision.

From there you should have enough experience to start digging the trenches (track).

The content

As you might have guessed this track is not about your "mothers" Data Vault ;). Here we try to understand not the basics of Data Vault, but the FULL SPECTRUM of Anchor Style Modeling techniques (including a little Anchor Modeling) and the standards, methodologies and architectures that come with it. We will look at extremes at both ends of the "Data Vault" solution space to learn what drives a good (modeling) and ETL approach. We'll also look at the tension between standardization, optimization and formalization of Anchor Style modeling techniques and their methodologies. We think all this detail is important because even small choices can have a substantial impact on your automation efforts!

(Advanced) Data Vault Modeling course

I'll show a compacted agenda since the track's "bullet sheet" is quite long at 11 pages (and still growing)

  • The Data Vault Definition Dilemma
    • We start with the discussion on (data modeling) approaches and methodologies, and the issues that arise trying to describe and define them. We'll focus on the tension between standardization, optimization and. formalization.
  • Data Vault Entity Types including reference tables
    • We'll do an in depth, exhaustive of all modeling constructs and transformation patters that are used, misused, unused or even problematic that are possible when doing Data Vault or other Anchor Style modeling approaches. We'll talk about structures like end dated links, key satellites and key-less hubs. We will also cover basic temporal techniques like "stacking" and hybrids between Data Vault and T3NF.
  • Data quality issues and Handling Data MIA (e.g. “Null” Rules)
    • We'll try to be complete on all kinds of modeling issues surrounding Data Vault, form Nulls to non-unique keys and how to handle them.
  • Modeling Styles, transformations. model analysis and change management
    • We'll cover and compare and classify all varieties of Anchor Style modeling techniques and approaches like "Anchor Vault" and classic Data Vault. We'll also dive into model analysis and change management.
  • Inheritance and virtual/derived Data Vault entities
    • We'll cover the basics of using derivation within Data Vault. We'll do the basics of sub-typing within the Data Vault.

Note: the first scheduled course will also have a free third day in which we will discuss all kinds of subjects including (but not limited to) the Architecture and Advanced courses of this track. Don't hesitate to contact me with questions and suggestions for this extra day.


doucetrr said...

Hi Martijn - any plans to deliver MATTER in North America? Would be very excited to see it over here.

Unknown said...

I'll be doing 2 presentations at the Data Modeling zone 2012 in Baltimore that closely related to the program (but not the whole program). So while I will do some of it on seminars etc doing the whole program in the USA is currently not in the planning (alas)

johnnygay said...

I attended your Data Vault presentation at DMZ. It was the perfect adjunct to Dan's introduction. I'll be taking this one back to my fellow DAs at Moneygram. It appears to be a very promising approach to investigate further!

Unknown said...

Thanks John,

Glad it was helpful, it was fun doing the talk.

Keep me in the loop on how you're going with your prospective Data Vault.