The Data Vault Handbook - Concepts and Applications

— Row 1: Business key “AAA” appears with new attributes (“Blue”, 100, “Small”). The Hash Diff is new, so the row is inserted into the satellite.

— Row 2: The same business key is loaded again, but the attributes haven’t changed. Since the Hash Diff is the same, the row is not inserted.

— Row 3: One attribute changes to “Green”, resulting in a different Hash Diff. This counts as a delta, so the row is inserted.

— Row 4: The data is identical to the previous day. The Hash Diff doesn’t change, so the row is skipped.

In conclusion, rather than comparing each attribute individually, the load- ing process for our Satellite entities (Section 3.1.3) from the Staging Area to the Raw Data Vault will utilize the Hash Diff column to efficiently detect changes in attributes for each business key. While the example provided only includes three attributes, the performance benefits become even more sig- nificant when handling tables with dozens or even hundreds of attributes. This approach streamlines the detection process, reducing the computation- al overhead and enhancing the overall efficiency of the data pipeline.