Introduction
There are always a lot of discussions about the nature of hubs. There have been several linkedin discussions and blog-post who address the issue. Since these discussions are usually crossing all kind of levels of conceptualization I'd like to clarify on which levels these discussions take place. In this short post I'd like to shortcut these different levels of discussions around the nature of hubs.Levels of Abstraction
I'll discern 4 levels of abstraction:- The logical level, on which hubs are just (business) keys
- On the conceptual level in which hubs are identified by concepts and their (stated) identifiers
- On the meta identifier level where we design and construct and scope identifiers and their supporting (identification) processes needed to identify concepts at the conceptual level.
- On the ontologic level where we identify abstract concepts and ignore actual identification design.
1. Hubs as (key) transformations
Basically a HUB is an independent (logical) key (a key with no part of it dependent on another key). The only exception is when there are 2 keys and one is the surrogate key of the other, then the surrogate key does not become a hub (but optionally a keysat), but this is a minor issue since we can choose to ignore (sourced) surrogate keys altogether. When there is ONLY a surrogate key, we have an interesting issue, since a key on it's own can never be a surrogate key (because it's only a surrogate for another key), even if that was the intention. We might say it's a 'technical' key that will source a Hub when no other candidate key is found. The sources for Hubs/keys for a Data Vault(=Raw Data Vault+Rule Data Vault) are all relevant source system data models and all business data models. If there are several situations that try to model the same hub with different keys, you basically model all of the distinct keys. Some optimization/consolidation is possible when having multiple keys for the same concept but these decisions should be delegated to the correct modeling of the business information model using specialization/generalization. This idea relegates a lot of design and definition of (central) hubs to (master) data management and conceptual data model design. Since we distinguish between concepts and keys, a concept identified with a dependent key is a link by definition, but still an integration point. From a conceptual point a link can be seen as a hub as well.2. Hubs as conceptual entities
Most people will equal hubs with conceptual entities like customer or product. The assumption is that an important master hub usually houses an important identifier like tax id, Social Security Number or product code. The discussion on which identifiers to use (or ignore) as 'master' hub is however not a Data Vault discussion, but a business information model discussion. Here we try to find business identifiers with the correct scope and meaning. In the Data Vault we just implement (one or more) of the available model identifiers as the master key in our hub. Again, if we have several identifiers for (entity sub-types of ) one concept we can opt to use key satellites to model this in the Data Vault.
3. Hubs as Identification schemes
A lot of practitioners try to fix business key issues in the Data Vault, with the goal to create/construct or identify a hub that will house the master list of a certain entity. They are often enticed to try to construct their own identification or consolidation scheme. Again, this is not the task of a Data Vault but a task of (master) data management. Approved matching and fixing rules can still be applied to the (Business) Rule Data Vault. These kinds of actions are usually a result of failing to find/implement a single good identifier for a conceptual entity, which in turn might lead to multiple entities encoding the same concept.