Introduction
In a Data Vault we connect all our (temporal) information to our business keys by creating hubs for them. The assumption here is that business keys represent stable identifiers for our data. We do know that sometimes business keys change. For this we have a SAME-AS Link that connects an initial business key to it’s successor. Together with a satellite tracking when this happened this technique is able to track the occasional business key mutation.
The Problem
When business keys not only mutate one-on-one but also merge or split you need to be able to report dependent information both on the original business keys as well on the newly merged business key without seriously restructuring your Data Vault. This is also possible using the same SAME_AS_LINK connecting source and target business keys and using a set of calculations that can be performed on a satellite that allow us to view these mutations in a consistent manner.
Scenarios
We consider 4 scenario’s:1. Key Replacement: 1 to 1 key mutation: 1 key is replaced by one other key
2. Key Split: 1 to Many mutation. 1 key is split into several keys
3. Key Merge:Many to 1 mutation. Several keys are merged into one target key
4. Multi Key Transitions:Many to Many mutation. Several keys are replaced with several other keys
We will not explore scenario 4 here because it is rare and rather complex. Rest assured this solution can handle this scenario as well. Implementing scenario 1 is left as an exercise to the reader.
All scenario’s we can also distinguish a key preserving scenario besides the non key preserving scenarios. With Key preserving scenario the source and target key(s) overlap.
1. Key Replacement with key preservation: Key mutates into itself, Identity transform.
2. Key Split. Source key is also one of the target keys
3. Key Merge. Target key is also one of the source keys
4. Multi Key Transitions: Source key set and target key set overlap
Example
As an Example we use the Hospitals example proposed by Martin Ekeblad. In there we have several Hospital (units) that are regularly reorganized. We assume the hospitals are frequently merged together or split into new units. We want to be able to track these actions consistently through time in a the corresponding Data Vault.
Source Data Model
We assume each hospital has a Hospital Number that is the business key identifying the hospital. We assume we have a table called Hospitals with several attributes and the business key. We also have a Hospital Mutation table recording the mergers and splits of the hospitals numbers. We also provide an insert/mutation date as basis for the Data Vault load date.
Source table Hospitals
Hospital_Number
|
mutation/insertion date
|
SU
|
2010-01-01
|
AL
|
2010-01-01
|
NU
|
2010-01-01
|
SUAL
|
2012-02-01
|
NU1
|
2012-02-01
|
NU2
|
2012-02-01
|
NU2a
|
2012-03-01
|
NU2b
|
2012-03-01
|
We also have a table with mutations of hospitals, Hospital mergers as well as splits. Here we record initial and target hospital number.
Source table Hospital_ Mutations
Initial_Hospital_Number
|
Target_Hospital_Number
|
mutation/insertion date
|
SU
|
SUAL
|
2012-02-01
|
AL
|
SUAL
|
2012-02-01
|
NU
|
NU1
|
2012-02-01
|
NU
|
NU2
|
2012-02-01
|
NU2
|
NU2a
|
2012-03-01
|
NU2
|
NU2b
|
2012-03-01
|
Data Vault Model
The Data Vault consists of 4 tables:
- Hospital Hub containing the Hospital Number
- A Same As Link connecting initial and target hospitals numbers
- A basic satellite SLNK Hospitals recording the changing hospital numbers
- A derived satellite SLNK Hospital Mutations in which we will capture all changes in an uniform and consistent way using business rules.
Table Details
HUB_Hospitals
HUB_Hospital_SQN
|
Hospital_Number
|
HUB_Hosptial_LDTS
|
1
|
SU
|
2010-01-01
|
2
|
AL
|
2010-01-01
|
3
|
NU
|
2010-01-01
|
4
|
SUAL
|
2012-02-01
|
5
|
NU1
|
2012-02-01
|
6
|
NU2
|
2012-02-01
|
7
|
NU2a
|
2012-03-01
|
8
|
NU2b
|
2012-03-01
|
SAME-AS-LINK table SLNK_Hospitals
SLNK_Hospitals_SQN
|
Initial_Hospital_SQN
|
New_Hospital_SQN
|
SOURCE
|
this LINK records the from and to Hospital as recorded by the source. This link has all the source/target records required to facilitate a full temporal consistent list of hospital mutations.
Basic satellite table SAT_SLNK_Hospitals
SLNK_Hospitals_SQN
|
SLNK_Hospitals_LDTS
|
SLNK_Hospitals_LEDTS
|
SOURCE
|
this table tracks only actual inserts from table Hospital_Mutations
Derived satellite table SAT_SLNK_Hospitals_Mutations
SLNK_Hospitals_SQN
|
Load Date
|
Load Date End
|
Proration
|
SOURCE
|
MutType
|
Keyrate
|
tracks all detailed merger and split info. Note that this is assumed here to be mostly *derived* info that is created by subsequent business rules (ETL) code.
We record the following additional fields:
- Proration
- Proration column to adjust amounts that need to be prorated across different hospital units after they have been split. This column is adjustable by using complex business rules.
- Mutation type
- Type of key mutation: Replace,Split,Merge
- Keyrate Division
- Calculated division rate of the mutation. It is used to calculate e basic prorations
- We have 3 business rules:
- Identity Rule: Will insert records in SAME AS LINK and derived sat linking keys back to themselves
- Proration Rule: Will calculate proration based on splitting. We assume an even proration for all target keys
- Transitivity Rule: Will calculate new LINK records in the SAME-AS LINK and sat that represent the transitive closure of repeated mergers or splits. This rule will also recursively calculate proration.
Processing Steps
We will show the actual process using several steps on a concrete example. We will use the data as shown at the source tables. The processing will be shown in 3 steps.
- Inserting the basic keys: Identity Rule
- Split and Merge Hospital Units
- Merge of SU and AL into SUAL on (2012-01-01)
- Split of NU into NU1,NU2
- Repeated split of NU2 into NU2a, NU2b
Also we will use Business Keys in the links for this example. Also non essential load dates have been omitted as well.
Step 1. Inserting the basic keys: Identity Rule
SLNK_Hospitals
1
|
SU
|
SU
|
Identity Rule
|
2
|
AL
|
AL
|
Identity Rule
|
3
|
NU
|
NU
|
Identity Rule
|
SAT_SLNK_Hospitals
SAT_SLNK_Hospitals_ Mutations
SQN
|
Load Date
|
Load Date End
|
Prorate
|
SOURCE
|
MutationType
|
Keyrate
| |||
1
|
2010-01-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
2
|
2010-01-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
3
|
2010-01-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
Step 2a. Merge of SU and AL into SUAL on (2012-01-01)
SLNK_Hospitals
SLNK_Hospitals_SQN
|
Initial_Hospital_SQN
|
Target_Hospital_SQN
|
Source
|
1
|
SU
|
SU
|
Identity Rule
|
2
|
AL
|
AL
|
Identity Rule
|
3
|
NU
|
NU
|
Identity Rule
|
4
|
SU
|
SUAL
|
Hospital_Mutations
|
5
|
SU
|
SUAL
|
Hospital_Mutations
|
6
|
SUAL
|
SUAL
|
Identity Rule
|
SAT_SLNK_Hospitals
SLNK_Hospitals_SQN
|
SLNK_Hospitals_LDTS
|
SLNK_Hospitals_LEDTS
|
SOURCE
|
4
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
5
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
SAT_SLNK_Hospitals_ Mutations
SQN
|
Load Date
|
Load Date End
|
Prorate
|
SOURCE
|
MutationType
|
Keyrate
| |||
1
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
2
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
3
|
2010-01-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
4
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
Merge
|
1
| |||
5
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
Merge
|
1
| |||
6
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
Step 2b. Split of NU into NU1,NU2
SLNK_Hospitals
SLNK_Hospitals_SQN
|
Initial_Hospital_SQN
|
New_Hospital_SQN
|
Source
|
1
|
SU
|
SU
|
Identity Rule
|
2
|
AL
|
AL
|
Identity Rule
|
3
|
NU
|
NU
|
Identity Rule
|
4
|
SU
|
SUAL
|
Hospital_Mutations
|
5
|
SU
|
SUAL
|
Hospital_Mutations
|
6
|
SUAL
|
SUAL
|
Identity Rule
|
7
|
NU
|
NU1
|
Hospital_Mutations
|
8
|
NU
|
NU2
|
Hospital_Mutations
|
9
|
NU1
|
NU1
|
Identity Rule
|
10
|
NU2
|
NU2
|
Identity Rule
|
SAT_SLNK_Hospitals
SLNK_Hospitals_SQN
|
SLNK_Hospitals_LDTS
|
SLNK_Hospitals_LEDTS
|
SOURCE
|
4
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
5
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
7
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
8
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
SAT_SLNK_Hospitals_ Mutations
SQN
|
Load Date
|
Load Date End
|
Prorate
|
SOURCE
|
MutationType
|
Keyrate
| |||
1
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
2
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
3
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
4
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
Merge
|
1
| |||
5
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
Merge
|
1
| |||
6
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
7
|
2012-02-01
|
9999-12-31
|
50
|
Proration Rule
|
Split
|
2
| |||
8
|
2012-02-01
|
9999-12-31
|
50
|
Proration Rule
|
Split
|
2
| |||
9
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
10
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
Step 3. Split of NU2 into NU2a, NU2b
SLNK_Hospitals
SLNK_Hospitals_SQN
|
Initial_Hospital_SQN
|
New_Hospital_SQN
|
Source
|
1
|
SU
|
SU
|
Identity Rule
|
2
|
AL
|
AL
|
Identity Rule
|
3
|
NU
|
NU
|
Identity Rule
|
4
|
SU
|
SUAL
|
Hospital_Mutations
|
5
|
SU
|
SUAL
|
Hospital_Mutations
|
6
|
SUAL
|
SUAL
|
Identity Rule
|
7
|
NU
|
NU1
|
Hospital_Mutations
|
8
|
NU
|
NU2
|
Hospital_Mutations
|
9
|
NU1
|
NU1
|
Identity Rule
|
10
|
NU2
|
NU2
|
Identity Rule
|
11
|
NU2
|
Nu2a
|
Hospital_Mutations
|
12
|
NU2
|
Nu2b
|
Hospital_Mutations
|
13
|
NU2a
|
NU2a
|
Identity Rule
|
14
|
NU2b
|
NU2b
|
Identity Rule
|
15
|
NU
|
NU2a
|
Transitivity Rule
|
16
|
NU
|
NU2b
|
Transitivity Rule
|
SAT_SLNK_Hospitals
SLNK_Hospitals_SQN
|
SLNK_Hospitals_LDTS
|
SLNK_Hospitals_LEDTS
|
SOURCE
|
4
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
5
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
7
|
2012-02-01
|
9999-12-31
|
Hospital_Mutations
|
8
|
2012-02-01
|
2012-03-01
|
Hospital_Mutations
|
11
|
2012-03-01
|
9999-12-31
|
Hospital_Mutations
|
12
|
2012-03-01
|
9999-12-31
|
Hospital_Mutations
|
SAT_SLNK_Hospitals_ Mutations
SQN
|
Load Date
|
Load Date End
|
Prorate
|
SOURCE
|
MutationType
|
Keyrate
| |||
1
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
2
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
3
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
4
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
Merge
|
1
| |||
5
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
Merge
|
1
| |||
6
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
7
|
2012-02-01
|
9999-12-31
|
50
|
Proration Rule
|
Split
|
2
| |||
8
|
2012-02-01
|
2012-03-01
|
50
|
Proration Rule
|
Split
|
2
| |||
9
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
10
|
2012-02-01
|
2012-03-01
|
100
|
Identity Rule
|
Replace
|
1
| |||
11
|
2012-03-01
|
9999-12-31
|
50
|
Proration Rule
|
Split
|
2
| |||
12
|
2012-03-01
|
9999-12-31
|
50
|
Proration Rule
|
Split
|
2
| |||
13
|
2012-03-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
14
|
2012-03-01
|
9999-12-31
|
100
|
Identity Rule
|
Replace
|
1
| |||
15
|
2012-03-01
|
9999-12-31
|
25
|
Transitivity Rule
|
Split
|
4
| |||
16
|
2012-03-01
|
9999-12-31
|
25
|
Transitivity Rule
|
Split
|
4
| |||
Reporting Example
How to use these tables in actual reporting:
Assume we have the following table Invoices (The format, DV or dimensional is not important right now)
Table Invoices
Invoice#
|
Amount
|
Hospital
|
0001
|
100
|
NU
|
And we want to prorate this amount to a Hospital given the reference date of 2012-03-01
Query:
Query:
SELECT invoice#,
amount * sat.proration / 100 amt,
inv.hospital AS Original_Hospital,
slnk.new_hospital_sqn AS Current_Hospital
FROM invoices inv
INNER JOIN slnk_hospitals slnk
ON slnk.initial_hospital_sqn = inv.hospital
INNER JOIN sat_slnk_hospitals_split_proration sat
ON sat.slnk_hospitals_sqn = slnk.slnk_hospitals_sqn
WHERE "2012-03-01" BETWEEN slnk_hospitals_ldts AND slnk_hospitals_ledts
amount * sat.proration / 100 amt,
inv.hospital AS Original_Hospital,
slnk.new_hospital_sqn AS Current_Hospital
FROM invoices inv
INNER JOIN slnk_hospitals slnk
ON slnk.initial_hospital_sqn = inv.hospital
INNER JOIN sat_slnk_hospitals_split_proration sat
ON sat.slnk_hospitals_sqn = slnk.slnk_hospitals_sqn
WHERE "2012-03-01" BETWEEN slnk_hospitals_ldts AND slnk_hospitals_ledts
This query will produce:
Invoice#
|
Prorated Amount
|
Original_Hospital_Number
|
Current_Hospital_Number
|
0001
|
50
|
NU
|
NU1
|
0001
|
25
|
NU
|
NU2a
|
0001
|
25
|
NU
|
NU2b
|
An alternative, a Key Mutation Satellite
As an alternative a KEY_MUTATION_SAT can be deployed. this is more efficient but less tracable:
Example
SAT_Hospital_TRANS
SQN
|
Target Hopsital
|
Load Date
|
Load Date End
|
Proration
|
Source
|
1
|
SU
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
2
|
AL
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
3
|
NU
|
2010-01-01
|
2012-02-01
|
100
|
Identity Rule
|
1
|
SUAL
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
1
|
SUAL
|
2012-02-01
|
9999-12-31
|
100
|
Hospital_Mutations
|
4
|
SUAL
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
3
|
NU1
|
2012-02-01
|
9999-12-31
|
50
|
Proration Rule
|
3
|
NU2
|
2012-02-01
|
2012-03-01
|
50
|
Proration Rule
|
5
|
NU1
|
2012-02-01
|
9999-12-31
|
100
|
Identity Rule
|
6
|
NU2
|
2012-02-01
|
2012-03-01
|
100
|
Identity Rule
|
6
|
NU2a
|
2012-03-01
|
9999-12-31
|
50
|
Proration Rule
|
6
|
NU2b
|
2012-03-01
|
9999-12-31
|
50
|
Proration Rule
|
7
|
NU2a
|
2012-03-01
|
9999-12-31
|
100
|
Identity Rule
|
8
|
NU2b
|
2012-03-01
|
9999-12-31
|
100
|
Identity Rule
|
3
|
NU2a
|
2012-03-01
|
9999-12-31
|
25
|
Transitivity Rule
|
3
|
NU2b
|
2012-03-01
|
9999-12-31
|
25
|
Transitivity Rule
|
This is exactly the same set, but now connected to the base hub and using an alternate business key
If business key splits are not expected this scenario, or a end dated one to many link can also be employed.
End dating
In key preserving scenarios when doing mergers it just means we do not need to do end dating on the original key. When splitting we need to add a new record for the key with the correct proration for the split.
Virtualizing the business rule and sparse population
We could opt for virtualizing parts of the business rules and creating a sparse mutation satellite. Esp the transitivity and identity rule are easily derivable. Just omitting them means the reporting queries become more complex. Specially when there are a LOT of business keys and the mutation rate is high.