Typical problems for a Data Vault Warehouse are stored in the data model and in the data. The cases range from Stage to Raw and Business Vault to Data Mart.
General functionality test of Raw Vault loading
The functionality of the Stage and Raw Vault can be easily checked with the so-called Yedi Test. To do this, simply restore the original data from the Raw Vault and compare it against the original delivery from the Stage. This test ensures that all data has been loaded correctly.
The entity Residences has the CustomerID (KundeID) and the StartingTime (Von) as primary keys and is therefore predestined for a multi-active satellite.
Identifying Relationship und driving Key Patterns
The relationship between order and position cannot change. The key situation makes every change a deletion and a new creation.
For all other relationships, the relationships can and will change. The test cases are all implemented on the foreign key in the customer to the club partner. The following situations occur here:
- the foreign key is optional and therefore also NULL
- the foreign key changes between association partners
- the foreign key changes from valid to invalid and in some cases back to valid again.
m:n Tabelle ohne eigenen Key
The table of deliveries (Lieferung) does not have its own primary key and usually only occurs once, as only successful deliveries are transmitted to the dwh.
This can be solved in several possible ways: keyed instance, transactional link, dependent child link, …
Integration der Bestellung
We have 2 sources. These are to be loaded integrated into the Order and Item hubs. The primary keys of both systems are simply incremented. The roadshow has an additional ‘RS’ for safe differentiation.
One could argue that the integration should be in the Business Vault. Because the keys are not really the same, the sets are disjoint, it is much more secure. Those are all valid arguments. The point here is to present the ability to integrate early.
There is a list of reference values for the product type:
Test cases in the data
We get 3 complete deliveries. The evaluation of the data is done on these 3 deliveries and must provide the correct data in each case.
Duplicates in the loading data
There are 2 types of duplicates in the product. In the first case it is an actual duplicate, all attributes are the same. In the second case, the attributes contradict each other. The data is in Testcases/Dubletten.sql:
- ProduktID 20 is the actual duplicate
- ProduktID 21 is a completely different product
Here we just want to see how this is handled.
Records without Business Key
In the Delivery Service table (LieferDienst), there are records with valid values without a key. Again, just the question of how to handle this.
In Testcases/Missing-BusinessKey.sql there are two delivery services without a business key.
Changes in customer data
A very simple test case, the data in the customer changes to a value in delivery 2 and gets the values from delivery 1 again in delivery 3. The test case is on the KundedID 107
Deletions in the customer
There are cases in the customer where the customer was deleted in the second delivery and reappears in the third delivery. A test case is customer 70.
Delivery addresses without customer
The first delivery contains delivery addresses for which there is no record with the same customer ID in the customer. The records have the CustomerID (KundeID) 999, 998 and 997.
Deletions of orders
Purchase orders are relevant to the count and are deleted between deliveries.
- between 1 and 2 the purchase orderID (BestellungID) 99, 220 and 465.
- between 2 and 3 the orderID 1470 and the 1288.
Changes in dimensions
The hierarchy of the product category changes completely with both deliveries. So we have 3 different product hierarchies. These are to be displayed as as-what at the respective reporting time.