Friday, June 7, 2013

Data Vault vs Anchor Modeling: Who Is The One?

Introduction

Yesterday Data Vault and Anchor Modeling went head to head on the DWH Next generation Conference organized by BI-Podium and Vissers & van Baars recruitment. It was a great conference with over 300 attendees and lot's of sessions on Anchor Modeling and Data Vault



Anchor Modeling

Anchor Modeling has been invented by Lars Rönnbäck and Olle Regardt. It is a highly normalized anchor style modeling approach that has some aspects of 6NF. It looks somewhat similar to Data Vault, but there are a lot of 'gotcha's'. It is a closed business model driven approach with it's own simple business/information model technique and just like Data Vault uses a small sets of building blocks and adds time to the data. There have been several blogs on Anchor Modeling, by +Richard Steijn here, Hennie de Nooijer here and WorldOfIntelligence here.

Data Vault vs Anchor Modeling

Lars made a start with comparing Anchor Modeling and Data Vault here. Is there any truth to glean from the the specific items, their meaning and the score? Not really, because THE Data Vault flavour does not exist that we can compare with the tight definitions of Anchor Modeling. I found it more the Agile Data Vault style of +Hans Patrik Hultgren compared to Anchor Modeling. Many of the issues mentioned stem either from (past) best practices or lack of detailed standards on e.g. temporal data management. Most are not essential or irrevocably some differences run much deeper. Interestingly most techniques currently in use in Anchor Modeling(except the annex helper tables for bi-temporal modeling) are also used at (some) Data Vaults.

The Hidden Snag: Auditability & Adaptability

If fact, most aspects of Anchor Modeling we can apply to Data Vault or Anchor Vault as well, but auditability creates some serious issues. Anchor Models can never be guaranteed 100% audtiable in all circumstances, and proving Anchor Models are semantic equivalent with (arbitrary) source models requires some serious additional effort. This also means the transformations and loading routines have different issues that with a Data Vault, esp. on complex non auditable transformations (e.g. time line repairs). The flexibility of Data Vault flavors allows us to adapt to very different environments without losing the source data. The tightly defined Anchor Models can be easier to use due to their (internal) business model driven nature coupled with their high normalization. All this does not mean that source driven Anchor Models do not exist or are impossible. On the contrary, we see many people designing source driven Anchor Models, just as some create Business Model driven Data Vaults.

It's my party and I'll Vault how I want to.

It is clear to me that unless we address the issue of formally addressing data integration and source data models with integration modeling comparing Data Vault and Anchor Modeling is not really going to work. Instead we should focus on combining approaches. In fact my presentation on the DWH Next generation Conference showed a nice example of combining Anchor Modeling and Data Vault. As discussed in my post on data modeling styles they are just edge cases of a whole family. Since DV needs to work on a vastly larger range of architectures, it cannot be so tightly defined as Anchor Modeling.  It is this flexibility that proves that when we look at Data Vault in a flexible way, it should be better 'by definition' at capturing sources in an Achor Style, while Anchor Modeling is better at handling business models in an Anchor Style modeling technique. This is exactly how I see the usage of such patterns, not either/or but used both in the right context.

An Example

Here is the example to show we can use Anchor Modeling and Data Vault together effectively.

Challenge

žIntegrate disparate Business Keys
¡Keys not integrated
¡No Master Key
¡Want a 1-1 time variant mapping between system keys
¡Internal de-duplication should be possible
¡Business Rule Driven mappings
¡Efficient and flexible implementation

Resulting Model

The result of this challenge is the following model. It contains 3 normal hubs (with corresponding sats and links not shown). It contains 3 1-1 key mapping ties (end dated links) that register which Business Keys need to be unified. It contains 1 Anchor (a Business Key-less/Empty Hub) that connects it all together. Since none of the source keys is a true business key, and we do not want to invent our own, an Anchor is an ideal construct to use here since it does not contain a business key(just a surrogate key). We assume here that we have (complex) business rules that allow us to map the different keys to the central Anchor.

žSolution

Classical Single BK Source Driven Hubs
¡No Multi key or changing key etc. so no Anchor required
žEfficient 1-1 Key Maps/Ties
¡Efficient
¡Business Rule Driven, so stable cardinality
¡Dynamic and traceable
žEmpty Hub/Anchor to tie all Hubs to
¡No “Master” Business Key required

Architecture

The Architecture we see emerging clearly uses Data Vault for capturing sources, while it will use Anchor like constructs for connecting them together in a slightly more flexible way that Data Vault does. This way both techniques are used where their strengths lies.

A Unifying Style

While advances in Data Vault and Anchor Modeling are nice, i'd like to state that choice between and adaptability of the methods/approaches to circumstances by practitioners is also important. But for this we need to understand all the detailed differences between all these modeling techniques. Especially when we want to combine them. If we want this then we're back at a generic Information modeling or logical modeling technique to unify all Data Vault and Anchor Modeling styles. While Data Vault and Anchor Modeling try to seriously simplify the construction of Data Warehousing within their won context, WE still need to evaluate the approaches in our clients context. For this we need all the detail, so we can make a informed, independent, correct decision on which techniques to use. Alas, with Anchor Modeling and Data Vault using their own nomenclature, this does NOT make it easier for experts to understand and trade-off methodologies. Only the 'workers' doing the implementation have it far easier once an architecture and methodology has been chosen and automated. This realization was one of the key drivers for the MATTER educational program and also my research in unifying Anchor Modeling and Data Vault modeling using Fact Oriented information modeling and model to model transformations using FCO-IM.

In the end, Chosen Ones only exists in stories. For us mortals, as Codd said, only correct, consistent and complete and utter information hiding will save us from having to understand all these modeling techniques. In the mean time we'll be forced to thoroughly understand all aspects of data modeling, be it abstract information models or specific implementation modeling styles.

No comments: