The Data Vault Handbook - Concepts and Applications

2.1. STAGING AREA The Staging Area is the initial layer in the Data Vault architecture, responsible for col- lecting and storing raw, unmodified data from various source systems. Its primary purpose is to offload the operational bur-

den from these systems and get the control of the data on our side, serving as a centralized repository where data is gathered before any significant transformations occur. This layer ensures that the data remains as close to its original form as possible, preserving its integrity for auditability, histor- ization and the subsequent processing. In the Staging Area, only hard rules are applied (Section 1.3.1), so that the data’s fundamental meaning is not modified. Instead, these rules focus on addressing technical requirements to facilitate storage and processing. Ex- amples of hard rules include aligning data types, computing hashes (Hash Keys & Hash Diffs), and adding technical fields such as Record Source and Load Date Timestamp. By focusing solely on these hard rules, the Staging Area prepares the data for further processing in the subsequent layers of the Data Vault architecture. 2.1.1. TRANSIENT AND PERSISTENT STAGING AREA When designing a Staging Area, a crucial decision involves determining how long data should be retained in this layer. Generally, you can choose to keep the complete history of your data sources (Persistent Staging Area) or retain data only for a limited period, such as a single load cycle or a specific num- ber of days (Transient Staging Area). Each approach comes with its own set of advantages and disadvantages.

14

THE DATA VAULT HANDBOOK © SCALEFREE INTERNATIONAL GMBH 2025

Powered by