DATA VAULT: THE FOUNDATION FOR RELIABILITY PRINCIPLES OF DATA VAULT
Data Vault is a modern data modeling methodology and architecture, developed by Dan Linstedt, designed for scalability, auditability, flexibility, and long-term historical storage. Unlike older techniques, it directly addresses the challenges of today’s complex data landscape. Its fundamental modeling constructs are:
Hubs: Represent core business entities (e.g., Customer,
Links: Model relationships between Hubs (e.g., Order connecting Customer and Product). Links store the surrogate hash keys of connected Hubs and metadata. This allows for extreme flexibility, easy addition of new relationships, and historical tracking of associations.
Satellites: Store descriptive attributes of Hubs or Links over time (e.g., customer address). They capture changes by adding new rows with load timestamps and source identifiers, ensuring complete historization of attributes. Satellites can segment data by change rate or source.
Product). They store unique business keys, surrogate hash keys, and metadata, ensuring a single, stable reference point for each business concept across all systems.
Data Vault embodies “a single version of the facts” by retaining all raw data from all sources, even if conflicting. Unlike approaches that force a “single version of truth” upfront, Data Vault keeps both versions, with conflict resolution handled in downstream models (Business Vault or marts). This design makes the Data Vault inherently auditable and traceable, providing full data lineage within the warehouse. More than just a model, Data Vault is a “system of business intelligence” with a three-layer architecture:
Enterprise Data Warehouse (EDW):
Information Delivery Layer: Includes data marts and consumption endpoints built from the EDW.
Staging Area: Ingests raw data with minimal transformation. Computes the hashing and adds metadata.
Divided into Raw Data Vault (for raw data integration) and an optional Business Vault (for derived calculations).
This layered approach separates data integration from presentation, enhancing maintainability and flexibility. Data Vault’s structure also enables parallel loading and massive scalability, making it ideal for cloud MPP (massively parallel processing) platforms and handling huge data volumes with high velocity. These principles collectively make Data Vault a robust foundation for a reliable data ecosystem, built for auditing, data tracing, loading speed, and resilience to change. It provides a blueprint for handling new sources, requirements, and organizational changes gracefully, ensuring data correctness and lineage.
ARCHITECTING THE FUTURE OF ENTERPRISE DATA
7
Powered by FlippingBook