Crossnative

DVG (DataVault-Generator) fokussiert auf die Automatisierung der gesamten Projektabläufe in einem DataVault-Projekt. Dies bezieht Dokumentation, Analyse, Code-Erstellung, Data-Lineage, den Datenkatalog, fachliche Geschäftsregeln, Deployment, Installation, Migrationen, Data-Quality-Checks, Tests, produktive Beladungen, DSGVO-Konformität, zielgruppenspezifische Datamarts, bitemporale Historie, Korrektur-/Simulationsläufe u.a. mit ein.

Daher kann im Projekt komplett auf Entwickler verzichtet werden und das DWH durch Datenmodellierer mit fachlichen Schwerpunkt erstellt werden.
Eine hohe Effizienz erreicht der DVG dadurch, dass nur wenige Metadaten erfasst werden und die Erfassung durch Code-Completion, Templates, Referenzprüfungen u.a. optimal unterstützt wird. Fehlkonfigurationen werden vermieden, indem während der Eingabe sofort eine Validierung und Referenzprüfung stattfindet. Der Anwender erhält Fehler- und Warnmeldungen unmittelbar.

DVG wendet bewährte DevOps Mechanismen konsequent an und versioniert die Metadaten in Git. Dadurch können sämtliche Abläufe im Projekt mit CI/CD Pipelines zu 100% automatisiert werden.

Mit der bereitgestellten Multienvironment-Unterstützung können unterschiedliche Anforderungen parallel umgesetzt werden.
DVG ist ideal für ein agiles Projektvorgehen mit häufigen Produktivnahmen geeignet. Lange Umsetzungszeiten gehören endgültig der Vergangenheit an.

Hierarchical links are modeled by a link referencing the same hub twice. Which page is the parent/child is determined by the names of the references and the driving key property (green in the graphic).
In the Datamart, the hierarchy of product categories has been broken down into a flat structure within the product dimension.

Crossnative_HierarchicalLinks-1

The place of residence has a functional history in addition to the usual technical history in DataVault. In DVG, all data is always historized bitemporally, so that only the field to be used for the functional history is configured in DVG.
„Real“ multi-active satellites (e.g. multilingual product descriptions) could be mapped as a satellite of a link between a product hub and a language hub or as a satellite of a sub-hub with business key ProductId+LanguageId.

Crossnative_MultiActiveSatellite-2

In DVG, an effectiveness satellite is automatically created for each link. No user input is required for this.
The driving key (green in the graphic) is modeled for each link.
Relationships are automatically set to invalid if:
– a deletion is explicitly delivered with a delta delivery
– or the relationship is no longer contained in a full delivery
– or a new value is supplied for the same driving key

Crossnative_ValidityInRelationships-3

The table is modeled as a transactional link. As the delivery addresses/delivery services can be uniquely assigned to the item, the driving key is only on the item.
Context information on the delivery is stored in the satellite.
Alternatively, modeling as a hub with a business key consisting of the three fields involved would also be possible.

Crossnative_MtoNTables-4

In the DVG, all target tables can be loaded from multiple sources.
The automatically created RawVault contained different hubs for delivery items of the web store or the roadshow. For early integration, the roadshow hub was simply deleted.
The automatically generated loading of the RawVault from the roadshow was then reconfigured so that the fields were immediately transferred to the target tables (hubs, links and satellites) of the web store. It is helpful that the DVG immediately reports all inconsistencies (caused by deleting the hub) via validation and that the reconfiguration of the loading de facto only consists of processing these messages. In particular, the type inconsistency of the order ID (numeric/character) in the webshop/roadshow was also automatically recognized.
DVG also offers the option of carrying out early integration at satellite level. In addition to the technical field RecordSource, a field SourceArt has been introduced in order to be able to differentiate the data later on, which is filled with the constant values „Web“ or „Road“.

Crossnative_EarlyIntegration-5

The DVG has a reference/harmonization feature that enables the following:
– Assignment of descriptive texts to codes
– Harmonization of different codes from different delivery systems to uniform values
This feature is also historized bitemporally.
Therefore, only one harmonization category „productType“ needs to be created in the DVG and the corresponding product descriptions delivered.
When loading the datamarts, the product types can then be converted into descriptive texts.

Crossnative_HistorizedReferenceTable-6

A „duplicateOrder“ can be defined for duplicates in the DVG. In real projects, this is usually a timestamp. Based on the „duplicateOrder“, the DVG marks the last data record with the status „ok“ for each business key/functional validity in the staging tables for duplicates, and all others with the status „duplicate“.
Only data records with the status „ok“ are transferred to the RawVault.
As there were also completely identical duplicates in the test data, only the last data record delivered was processed in the challenge.
As DVG permanently saves the data in staging tables, the duplicates are not lost.
If duplicates occur and no „duplicateOrder“ is modeled, this is reported by the automatically executed quality checks.

Crossnative_DuplicateData-7

In the DVG, you can configure whether and which fields of the business key may be NULL.
If NULL is supplied for permitted business key fields, the hash key is calculated on the basis of an empty string and the data record is processed.
If NULL is not permitted, the data record is not transferred to the staging table, but to an error table instead. As part of the data quality checks, this is reported with the type „wrongRows“.

Crossnative_RowsWithoutBusinesskey-8

In the DVG, all data (staging, vault model, datamart) is always historicized bitemporally.
To do this, the data modeler only has to configure the field in which the delivery system provides the functional validity.
If a delivery system does not provide business validity, „loadtime()“ can be configured, i.e. the business and technical validity is always the same.
All fields required for the bitemporal history are created automatically and do not need to be managed by the modeler.

Crossnative_ChangesOfAttributes-9

An effectiveness satellite is automatically created for each hub in DVG. No user input is required for this.
Business keys are automatically set to invalid if:
– a deletion is explicitly delivered with a delta delivery
– or the business key is no longer included in a full delivery
Delivery systems can decide daily for each delivery whether they want to deliver a delta or full delivery. Therefore, a delivery system that normally delivers delta can, for example, decide independently after a manual intervention in the source DB to deliver a full delivery once in order to adopt the manual changes.

Crossnative_DeletionOfBusinessKeys-10

The behavior can be configured with the goldensource parameter.
1) Missing references are automatically added to the target hub and its effectiveness satellite. The data records then contain the corresponding entry in the Recordsource field.
2) Links and satellites are still filled and the data quality checks report a violation of the referential integrity. The missing hub entries can then be added in real projects in the next delivery.

Crossnative_InvalidForeignkeys-11

Orders are treated completely analogously to business objects in the DVG. Therefore, the same statements apply as for Deletion of Business Keys (10)

Crossnative_DeletionOfOrders-12

Like all data, dimensions in DVG also have a complete bitemporal history.
DVG can create different target group-specific data access layers on the same data. Among other things, users can choose whether they only see the most recent data, data for specific external time grids or all historical data. This allows the complexity of the history to be adapted to the target groups and a bitemporal, one-dimensional or no history at all to be provided.
Note: It is also possible to configure which Datamart tables the respective target group sees and whether there is access to GDPR-protected data.

Crossnative_ChangeOfDimensions-13

DVG provides the data correctly. This can be checked with the help of queries to the provided datamart layer (star schema).

Crossnative_KeyFigures-14

Business rules are mapped in the DVG using a metadata-based rule engine. Additional ETL tools, the input of SQL code or similar steps typically performed by developers are not required.
The same rule engine is also used to load the data marts and to create functional data quality checks.
Results of business rules are stored in the BusinesVault or in datamarts.
Details are shown in the PDF.

Crossnative_BusinessRules-15

For data modelers, DVG has a reference search in the tool that shows where all tables and fields are used. These can be references in the context of referential integrity, loading, use in business rules and others. Double-click to immediately open the corresponding location in the model files.
For end users, DVG provides a complete lineage at field, table and process level. This data is available in metadata tables and can be analyzed with any reporting tool.
In addition, a source and impact analysis is provided at table level as an HTML file.
There are also metadata tables with data catalog information.
If required, DVG exports interface files that can be imported into the data governance tool used by the customer.

Crossnative_DataLineage-16

It is not necessary to configure error handling in DVG. DVG automatically distinguishes between three types of errors:

1. critical process errors
An example would be an exception in a loading process (e.g. due to an internal error in the database)
In this case, DWH loading is aborted. Once the cause of the error has been rectified, loading can be restarted and data processing continues from the point of termination.

2. Serious errors in individual data records
As a rule, these are incorrect number/date formats that cannot be converted.
The incorrect records are entered in an error table and not transferred to staging/vault/datamart tables. DWH loading is continued with the error-free data. The number of unprocessed data records is reported as part of the data quality checks.

3. data quality checks
Technical data quality checks are created automatically. (referential integrity, mandatory fields not filled, violations of business keys, etc.)
Functional data quality checks can also be modeled.
All quality checks are executed with every DWH load and the results are provided in metadata tables.

Crossnative_ErrorHandling-17

The DVG tool is no longer required to load the DWH.
The DWH loading can be integrated with any orchestration tool that can call shell scripts. (e.g. Control-M, UC4, Airflow or simply crontab)
The script for initializing the DWH load must be called for each DWH load. In addition, each delivery system must call the ready script in which it states that all data is available and whether a delta or full delivery has been provided.
All these scripts only maintain metadata and are completed after a few seconds.
The actual DWH loading takes place with the help of internal database schedulers. (see Scheduling 20)

Crossnative_Orchestration-18

DVG generates all scripts and control information required for the installation and operation of the DWH. This includes scripts for the new installation of the DWH (typical for developer testing) as well as for the migration of an existing DWH (typical in testing, acceptance, production). The migration scripts not only contain the necessary ALTER commands, but also migrate the data content of the affected tables if the logic has changed.
The scripts are packed into a zip file by DVG, and only this zip file is required for the actual deployment.
Deployment is carried out by calling shell scripts. (directly in the DVG menu, on the command line or within CI/CD pipelines)
It is recommended to fully automate the deployment (see PDF)

Crossnative_Deployment-19

No configurations of loading processes and their dependencies are necessary in the DVG. There are no corresponding menu items.
The processes to be executed during a load and the dependencies between the processes are derived completely automatically from the defined data model and the data lineage and stored in the DB metadata tables.
The database-internal scheduler is used during loading. In productive operation, neither the DVG tool itself nor any other tool is used. Loading takes place within the database. This has the advantage that there is no communication overhead between the database and other tools.
The scaling/load balancing of the database is used to enable a high degree of parallel loading.
With the help of metadata tables, the current loading status can be tracked at any time.

Crossnative_Scheduling-20

Snowflake is currently implemented as the target platform, as Snowflake optimally supports the processing of mass data required in the DWH.
If required, further database platforms can be added as target platforms.

DVG is a browser-based application. All data modelers access the same instance, which is installed with Helm in a Kubernetes cluster. The prerequisite is that access to a Git repository is possible.
Alternatively, DVG can be provided as SaaS.
The DVG tool is only required in the development environment.
A server on which a snowsql client is installed is required in all environments. This server is used to create and load the DWH created with DVG. (see Deployment(19) and Orchestration(18))

Crossnative

Willibalds Data Warehouse

Automation Tools