My first Publication Agile-Data-Warehouse-Design-eBook | Page 164
Modeling Star Schemas
Insulate the Data Warehouse from Business Key Change
Surrogate keys protect the data warehouse from changes or glitches in the way
business keys are administrated. For example, if business keys change when a
business process is migrated to a new packaged application they can be updated in
a dimension without affecting millions or billions of facts. If business keys are
reused when products are discontinued or customers depart, new dimensional
rows with new surrogate keys can be assigned to these reused codes so they remain
distinct from their previous use. Business key instability like this is often hidden
because it may not cause a problem in the operational systems if older transactions
are archived, but the problem surfaces when the data warehouse has to take a
longer-term view.
Cope with Multiple Business Keys for a Dimension
Surrogate keys allow the data warehouse to integrate events from multiple opera-
tional sources that store information about the same conformed dimensions using
different business keys. By using a surrogate key, you can sidestep the question of
which business key is best for a dimension—a politically sensitive issue within
organizations that have grown by merger or acquisition. The safest answer is “there
is no best business key.” They are all important non-key attributes that should be
stored in the dimension. The multiple business keys will be used by ETL processes
to assign surrogate foreign keys to the facts derived from the multiple sources.
They may also all have analytical value to various stakeholder groups, especially if
some are natural keys that the stakeholders work with outside of their databases.
Track Dimensional History Efficiently
The data warehouse must provide history for the descriptive attributes of slowly
changing dimensions (SCDs) as well as rapidly changing facts. Surrogate keys
provide a simple mechanism for storing this history directly in dimension tables
and efficiently joining the correct historical descriptions to the historical facts at
query time.
Handle Missing Dimensional Values
Every dimension needs at least one special record that represents ‘missing’ or ‘not
applicable’ to cope with errors or minor variations in business events. For example,
in-store and telephone CUSTOMER ORDERS are handled by a SALESPERSON
whereas online orders are not; they do not naturally have a SALESPERSON dimen-
sion. When online orders are loaded into the same fact table as other orders their
Null or missing SALESPERSON IDs are replaced with a special surrogate key value
that points to a “No Salesperson” record in the SALESPERSON dimension. This
allows order queries that group by SALESPERSON to still include online orders
but display them using “Missing” labels defined by stakeholders. If the
SALESPERSON foreign key was left as Null, online orders would always be ex-
cluded. Having a special missing row in every dimension simplifies query joins; all
joins can be inner joins as every fact will always find a matching dimension record.
143
When business
keys are changed or
reused, surrogate
keys prevent facts
from been affected
Surrogate keys
provide a single
primary key for
conformed
dimensions that
have multiple
business keys
originating from
multiple source
systems
Surrogate keys
allow dimensional
history to be tracked
and joined efficiently
to facts
Surrogate keys can
represent special
missing dimension
values for which
there are no
business keys; e.g.,
“No Customer”,
“Missing Date”