Saturday, June 1, 2013

Repeating the Past: Repairing your history in with temporal data.

File:George Santayana.jpg
Those that forget the past are Condemned to repeat it
^ George Santayana (1905) Reason in Common Sense, volume 1 of The Life of Reason

Introduction

When you forget the past, you are condemned to repeat it. This also holds true when you model history and history of history in your database. When you recreate a transaction history in a fully temporal information system with loading data from a source that does not record one you are alas forgetting the past. While there is no way to reconstruct the pas from thin air, when the source information system maintains a valid timeline (a user controllable timeline that among others allows for backdating), you can still recreate an approximation of the past by repeating your valid timeline into your transaction timeline.

Forgetting the past

Temporal Data

Temporal Databases or temporal data modeling techniques allow you to record timelines. You can record when somebody lived at a certain address. We have 2 different classes of timelines; those that are connected to our own time, usually called logging, transaction, actual or asserted time, and time fully controllable by the end users, usually called valid, stated or user time. We talk about bi-temporal time when data is registered against 2 timelines, of each class one.

Data warehousing and logging

Data warehouses that log all data (implement a transaction/logged timeline) complement source systems that only register stated time so that they record a bi-temporal time. This is typically done with e.g. a Data Vault. Of course, a Data vault can never show the logged time for the past that it did not log, even if the source recorded a stated/valid time. However, we can create an approximation of the past with the source stated time.

Repeating the past

There are 2 strategies to repair the past, extension of the first recorded occurence into the past, and transposing the source valid timeline to the past. The first strategy is simple and can always be applied, but is a gross simplification. The second strategy only works when there is a valid timeline of the past. It assumes that this valid timeline is a 1-1 with the transaction timeline.

Example

Lets show a detailed example on how the repair the past by repeating it. we will show the second strategy, the first is left as an exercise to the reader. We start with a source table example that registers a stated and logged timeline for a certain value in the Data warehouse. We assume the source has been registering the data for some time. We also assume that for the past there is an 1-1 between the stated and logged time. This is a serious assumption, and the reconstructed past is certainly not auditable in the sense that it has been properly logged. We start processing the source data at time t3 (@Tran_hist_start).

Source Table:
LOAD DT
END DT
VALID FROM_DT
VALID TO_DT
VALUE
T3
T4
T0
T1
1
T3
-
T1
T2
3
T3
T5
T2
T3
4
T3
-
T3
-
6
T4
-
T0
T1
2
T5
-
T2
T3
5

This Source Table in a picture:

We can for example read the time points as  months, with T0 being January. T1=February and so on. The Value can be seen as e.g. Status values, 1='Created',2='Open',.. 6='Destroyed'. We can read the lower boxes in the picture as: we started to record in april that from jan to feb the status was Created.In May we recorded that in the the jan to feb time frame the status was 'Open'. This is currently still true (true until end of time).

Transposed Situation

Lets transpose the valid timeline at time T3(@Tran_hist_start) and use that as a basis to reconstruct the unknown past. We assume here that in that past we don't know the future as known at T3.
Our bi temporal table will now look like this:

Table Values:
LOAD DT
END DT
VALID FROM_DT
VALID_ TO_DT
VALUE
T0
T1
T0
-
1
T1
T3
T0
T1
1
T1
T2
T1
-
3
T2
T3
T1
T2
3
T2
T3
T2
-
4
T3
T4
T0
T1
1
T3
-
T1
T2
3
T3
T5
T2
T5
4
T3
-
T3
-
6
T4
-
T0
T1
2
T5
-
T2
T3
5


a Visual representation of the data and the transformation:

Let's look at the 'SQL' Code to produe this past. We see that we create set of vertical oriented timeslices and a set of horizontal timelsieces to fill in the missing past.

Pseudo SQL Code

Vertical History:

Horizontal History

Total History:

Conclusion

Assuming the user controlled timeline (valid timeline, stated timeline) does not diverge significantly from the unknown transaction/logged timeline you can reconstruct the non logged past in a information system/data warehouse by pivoting and substituting the target transaction timeline with the source user controlled timeline for the missing past. By repeating this information we can 'repair the past'. This is especially handy when querying the past. Note that we are now repeating the past from the present because we 'forgot' to register this past in the first place. The question if we should actually persist this past or just derive this information at query time is a separate performance discussion.








No comments: