Once set, the Load Date Timestamp should never be altered, preserving a reliable, time-stamped view of the data as it appeared at the moment of load- ing into the Staging Area. This is also important for implementing the incre- mental loading of our entities through the different layers.
HASH DIFF
The Hash Diff is an attribute used in satellites to optimize the detection of changes in data rows. It speeds up lookups by providing a quick way to com- pare rows and identify modifications. The Hash Diff is generated similarly to Hash Keys but, instead of just hashing business keys, descriptive attributes from the source system are concatenated using a delimiter before the hash calculation. Prior to concatenation, leading and trailing spaces are removed, and the data is formatted in a standardized way to ensure consistency. In practice, the Hash Diff is used to quickly determine whether a change has occurred for a given business key. As new data arrives, a Hash Diff is calcu- lated by hashing the current set of descriptive attributes. This value is then compared to the most recent Hash Diff stored in the satellite for the same business key. If the Hash Diff remains unchanged, no new row is inserted, as there is no detected delta. If the value differs, it indicates that at least one descriptive attribute has changed, and a new record is added to the satellite.
18
THE DATA VAULT HANDBOOK © SCALEFREE INTERNATIONAL GMBH 2025
Powered by FlippingBook