Vaultspeed

VaultSpeed bietet eine erstklassige Datenautomatisierungslösung, die auf der Data Vault 2.0-Methodik aufbaut. Unternehmenskunden auf der ganzen Welt verlassen sich auf VaultSpeed, um die Integration von Daten aus mehreren Quellen zu automatisieren und branchenspezifische Metriken zusätzlich zu den integrierten Datensätzen zu speichern. VaultSpeed ist die Standard-SaaS-Lösung für Unternehmen, die die Erstellung und Pflege ihrer Datenwolke vereinfachen wollen.

Der Hauptsitz von VaultSpeed befindet sich in Leuven, Belgien, mit Niederlassungen in Seattle, USA und Vilnius, Litauen.

For the Hierarchical link, in the raw vault, a self-reference on Product category table is made. The result in the raw vault is a hub with a link that contains the hierarchical data. In the business vault, we create 2 snapshot PITs on each Hub. Using the PITs we build 2 Dimensions (Product and Product Category) that we can combine in a view. We also built a bridge combining the hubs and links we need to build the product hierarchy.

willibald vaultspeed 1

In the source modeler, tag the source object with the satellite signature and make it multi-active by setting activating the multi-active flag. Then you can set the VON attribute as the subsequence attribute. That is all you need to do to load the dataset correctly to the raw vault.

willibald vaultspeed 2

Relationships are defined in VaultSpeed by drawing the relationship between the involved hubs. This is easy to do using drag and drop in VaultSpeed’s source editor. The relation will be created then as a link table. M:N relationships can also be created by setting them an enire object as a many to many link. Keys can be made Driving keys by activating a flag.

willibald vaultspeed 3

The same way that you can set an object as a M:N link, you can also tag it as a transactional link (or non historical link). Each instance will be treated as a transaction. In case you have transactional links with deletes, we offer the option to handle this by inserting the negative of the delete transaction.

willibald vaultspeed 4

VaultSpeed offers the capabilities to integrate data early in the RDV. In this case we use our splitter and the hub grouping. We use the split to separate business keys that were delivered in one flattened source object. Our hub grouping capability can integrate the hubs business keys from multiple sources into the raw Vault. Our BK management screen can help you set good names for the combined BK attribute names and also make sure the order of the attributes in the BK can be influenced.

willibald vaultspeed 5

You can select the “reference” signature object type for these source tables in the editor. The object will be treated as a ref table. A standard REF table does truncate insert.

You can choose to keep the history and use the load date for historical comparison.

willibald vaultspeed 6

VaultSpeed used the hash diff to tell the difference between records. Intra load-cycle changes should be enabled in our parameters. A duplicate BK that is the same will generate the same HASHDIFF and will be loaded once in the SAT.

A duplicate BK with differences will have a different hashdiff and will be loaded to the SAT with both instances. Otherwise, you lose data. The load date of the second record is augmented by +2 microseconds. You can also use logpositions to derive the order of data.

willibald vaultspeed 7

This problem is also solved in the load of the raw vault. Since the business key is not provided, we cannot calculate the hash key. That is why these records are loaded into the sat using the unknown record in the hub.

If the records are corrected later on, they will be loaded in the hub as well, and the old record will get delete flag = Y because it no longer exists.

willibald vaultspeed 8

Changes are simply changed by the standard DV2 approach. Using the DV2 hashdiff to identify changes. If the data was actually changed, we will load the data as a new satellite record starting on that load date.

willibald vaultspeed 9

Deletes are simply changed by the standard DV2 approach. A new record inserted, that has the delete flag set to YES, to indicate a record was deleted.

Note that VaultSpeed can also do end-dating logic. The previous SAT record would be end dated on the delete. We support delete management on full loads, but also in incremental loads.

willibald vaultspeed 10

The EARLY_ARRIVING_FACTS parameter will ensure the HUB can also be loaded from the child object. That solves the problem. When the BK is not equal to the PK we have:

REFERENTIAL_INTEGRITY_VALIDATED

In case the link is not found, the record will be loaded into a ERR table and can be picked up later on.

REFERENTIAL_INTEGRITY_FORCE_LINK_LOAD

When enabled, it will force the link load even if it cannot be found. The record will point to the unknown record until the parent record is provided.

willibald vaultspeed 11

Deletes are simply changed by the standard DV2 approach. A new record inserted, that has the delete flag set to YES, to indicate a record was deleted.

Note that VaultSpeed can also do end-dating logic. The previous SAT record would be end dated on the delete. We support delete management on full loads, but also in incremental loads.

willibald vaultspeed 12

Dimensions are built on top of PITS in VaultSpeed, you can have snapshot PITs or detailed PITs. We need to filter on the correct snapshot timestamp to get a valid record of the dimension at a certain point in time.

willibald vaultspeed 13

For the reports. We make use of the PITs and bridges, and virtual dimension views you can build with VaultSpeed. In this case. We built a dimension view for the product, position,order and delivery. And we created a bridge between all three to connect the data.

willibald vaultspeed 14

Business rules can be built in VaultSpeed using several options. You could use for example our “links across sources”: this enables you to build links between sources. Useful for linking the order to the customer based on other attibutes. You can also build more complex business rules using VaultSpeed studio, where the developer gets total freedom to built his own automation templates.

willibald vaultspeed 15

A full metadata export is available in VaultSpeed providing input for technical impact analysis. You can configure the metadata attributes you want to export and then run this configuration against a specific release. All this lineage data can be exported using our GUI or REST API.

willibald vaultspeed 16

The error handling of data loads needs to be handled in the desired scheduler (in this case we chose Apache Airflow). All the transformation code and all the workflow DAGs generated by VaultSpeed are fully restartable, no loss of data or inconsistencies will occur. Workflows can also be partially restarted.

For data quality checks, you can build your own data quality templates using VaultSpeed studio. We also integrate with various DQ platforms through our API.

willibald vaultspeed 17

VaultSpeed integrates with various orchestration tools. It will generate the loading order and schedule. This can than be automatically deployed in your orchestration tool.

Mutiple flavors available: Apache Airflow, Azure data factory and Generic FMC, the latter can be used in all schedulers.

willibald vaultspeed 18

You can deploy directly on the target platform into your data runtime environment (typically on development), you can also deploy into your designated git-repo (typically to feed it into CI/CD processes) and you can createa custom deploy script to tailor the deployment entirely to your exact needs if you like.

willibald vaultspeed 19

Can be configured in our Flow management control screen:

set a start date the intial load date and point from where subsequent loads will be fed into the DWH
Set concurrency (how many jobs can run in parallel
Group tasks together in execution blocks
Schedule loads at regular intervals (only for batch loading, streaming does not need this setting)
Set target and databricks cluster (or ETL tool in other data stacks)
Deploy this into your preferred scheduler

willibald vaultspeed 20

We support various data platforms like Snowflake, Databricks, Azure synapse, Google BigQuery, etc. The code is deployed as DDL, procedures or notebooks. We also support multiple ETL solutions like dbt, Matillion ETL or Talend to run the code if you would like to use an ETL tool for this.

willibald vaultspeed 21

VaultSpeed is cloud SaaS
No virtual machines, no AMI’s, no Dockers or Kubernetes needed
Always running on the latest version
Always using the latest DV2.0 support
Lightweight Java Agent that sits in your network (hybrid approach)
Agent handles source metadata harvest and code deployments
Agent can be installed on any OS, using Java (oracle) JDK 8
Install takes 10 minutes, installation scripts for Linux, Windows, MacOS available

willibald vaultspeed 22

Vaultspeed

Willibalds Data Warehouse

Automation Tools