My first Publication Agile-Data-Warehouse-Design-eBook | Page 165
144
Chapter 5
Reserve the surrogate key value zero for the default Missing row in each dimen-
sion. You use this row zero to hold the stakeholders’ Missing labels recorded in
the BEAM✲ dimension table example data. If different types of missing are needed
you can add additional special rows, using negative surrogate keys, to represent
“Unknown”, “Not Applicable”, “Error” etc., leaving the normal dimension values
to use positive integers. Being consistent in the use of special value surrogate
keys can greatly simplify ETL processing.
Surrogate keys
enable multi-level
dimensions to
describe variable-
level business
processes
Surrogate keys
keep sensitive
data anonymous.
Unlike business
keys they cannot be
used to join
sensitive data to
source systems that
might provide full
disclosure
Use integer
surrogate keys
instead of
alphanumeric
business keys to
reduce the size of
fact tables and
indexes
Support Multi-Level Dimensions
Some business events have variable-level dimensional details. For example, tele-
sales orders are normally attributed to individual salespeople, but occasionally an
order is attributed to a sales team, when the salesperson is on probation or has left
before the order is processed. Orders are a variable-level who event. By using an
extension of the missing value SK technique, a number of additional special value
rows can be added to the SALESPERSON dimension to represent teams, branches,
regions, or other levels in the sales organization hierarchy, creating a multi-level
dimension to which both normal and exceptional facts can be attached. The multi-
level design pattern is covered in Chapter 6.
Protect Sensitive Information
For data protection or security reasons you might need to create anonymous
customer or employee dimensions for analyzing sensitive purchase habits or salary
payment facts. Anonymized dimensions obviously must not contain name, exact
address, date of birth, or other descriptive attributes that could be used in combi-
nation to identify individuals. However, if they use business keys (such as Cus-
tomer ID or Employee Number) as primary keys, then fact tables will contain
business key foreign keys that can be cross-referenced with other systems to
provide full disclosure. You can prevent this by replacing business keys with
surrogate keys, that do not exist outside of the data warehouse, and limiting access
to the disclosing business key to surrogate key mapping tables, to secure ETL
processes only.
Reduce Fact Table Size
Integer surrogate keys are more compact than datetime keys and most alphanu-
meric business keys—especially so called ‘smart keys’ that have embedded business
logic. This can lead to significant reductions in the size of fact tables. For example,
a typical detailed fact table has 10 dimensions and 5 measures. If the average length
of a business key is 10 characters (bytes), replacing each foreign key with a 4-byte
integer would halve the size of the fact table and its corresponding indexes. Even if
savings can only be made on a few foreign keys, every byte counts when it comes to
fact tables.