Those that forget the past are Condemned to repeat it
Introduction
When you forget the past, you are condemned to repeat it. This also holds true when you model history and history of history in your database. When you recreate a transaction history in a fully temporal information system with loading data from a source that does not record one you are alas forgetting the past. While there is no way to reconstruct the pas from thin air, when the source information system maintains a valid timeline (a user controllable timeline that among others allows for backdating), you can still recreate an approximation of the past by repeating your valid timeline into your transaction timeline.Forgetting the past
Temporal Data
Temporal Databases or temporal data modeling techniques allow you to record timelines. You can record when somebody lived at a certain address. We have 2 different classes of timelines; those that are connected to our own time, usually called logging, transaction, actual or asserted time, and time fully controllable by the end users, usually called valid, stated or user time. We talk about bi-temporal time when data is registered against 2 timelines, of each class one.Data warehousing and logging
Data warehouses that log all data (implement a transaction/logged timeline) complement source systems that only register stated time so that they record a bi-temporal time. This is typically done with e.g. a Data Vault. Of course, a Data vault can never show the logged time for the past that it did not log, even if the source recorded a stated/valid time. However, we can create an approximation of the past with the source stated time.
Repeating the past
There are 2 strategies to repair the past, extension of the first recorded occurence into the past, and transposing the source valid timeline to the past. The first strategy is simple and can always be applied, but is a gross simplification. The second strategy only works when there is a valid timeline of the past. It assumes that this valid timeline is a 1-1 with the transaction timeline.
We can for example read the time points as months, with T0 being January. T1=February and so on. The Value can be seen as e.g. Status values, 1='Created',2='Open',.. 6='Destroyed'. We can read the lower boxes in the picture as: we started to record in april that from jan to feb the status was Created.In May we recorded that in the the jan to feb time frame the status was 'Open'. This is currently still true (true until end of time).
Table Values:
a Visual representation of the data and the transformation:
Let's look at the 'SQL' Code to produe this past. We see that we create set of vertical oriented timeslices and a set of horizontal timelsieces to fill in the missing past.
Pseudo SQL Code
Vertical History:
Horizontal HistoryExample
Lets show a detailed example on how the repair the past by repeating it. we will show the second strategy, the first is left as an exercise to the reader. We start with a source table example that registers a stated and logged timeline for a certain value in the Data warehouse. We assume the source has been registering the data for some time. We also assume that for the past there is an 1-1 between the stated and logged time. This is a serious assumption, and the reconstructed past is certainly not auditable in the sense that it has been properly logged. We start processing the source data at time t3 (@Tran_hist_start).
Source Table:
LOAD DT
|
END DT
|
VALID FROM_DT
|
VALID TO_DT
|
VALUE
|
T3
|
T4
|
T0
|
T1
|
1
|
T3
|
-
|
T1
|
T2
|
3
|
T3
|
T5
|
T2
|
T3
|
4
|
T3
|
-
|
T3
|
-
|
6
|
T4
|
-
|
T0
|
T1
|
2
|
T5
|
-
|
T2
|
T3
|
5
|
This Source Table in a picture:
Transposed Situation
Lets transpose the valid timeline at time T3(@Tran_hist_start) and use that as a basis to reconstruct the unknown past. We assume here that in that past we don't know the future as known at T3.
Our bi temporal table will now look like this:
LOAD DT
|
END DT
|
VALID FROM_DT
|
VALID_ TO_DT
|
VALUE
|
T0
|
T1
|
T0
|
-
|
1
|
T1
|
T3
|
T0
|
T1
|
1
|
T1
|
T2
|
T1
|
-
|
3
|
T2
|
T3
|
T1
|
T2
|
3
|
T2
|
T3
|
T2
|
-
|
4
|
T3
|
T4
|
T0
|
T1
|
1
|
T3
|
-
|
T1
|
T2
|
3
|
T3
|
T5
|
T2
|
T5
|
4
|
T3
|
-
|
T3
|
-
|
6
|
T4
|
-
|
T0
|
T1
|
2
|
T5
|
-
|
T2
|
T3
|
5
|
a Visual representation of the data and the transformation:
Let's look at the 'SQL' Code to produe this past. We see that we create set of vertical oriented timeslices and a set of horizontal timelsieces to fill in the missing past.
Pseudo SQL Code
Vertical History:
Total History:
Conclusion
Assuming the user controlled timeline (valid timeline, stated timeline) does not diverge significantly from the unknown transaction/logged timeline you can reconstruct the non logged past in a information system/data warehouse by pivoting and substituting the target transaction timeline with the source user controlled timeline for the missing past. By repeating this information we can 'repair the past'. This is especially handy when querying the past. Note that we are now repeating the past from the present because we 'forgot' to register this past in the first place. The question if we should actually persist this past or just derive this information at query time is a separate performance discussion.
No comments:
Post a Comment