My first Publication Agile-Data-Warehouse-Design-eBook | Page 243
Dimensional Design Patterns for Time and Location
223
Solution
The FLIGHT FACT table, shown in Figure 7-12, has been modified to contain two
extra airport foreign keys, representing the journey origin and journey destination
locations not found in the original EMPLOYEE FLIGHT event details. With these
additional AIRPORT roles, it suddenly becomes trivial to answer questions about
where frequent flyer employees are located (Journey Origin) and where they really
have to go to (Journey Destination). These incredibly useful first and last locations
are hidden amongst all the flight information but can be found by applying a time-
based business rule: “all flights taken by the same employee, no more than four hours
apart, are legs of the same journey”. This test would be difficult for BI tools using
non-procedural SQL but relatively simple for ETL processes with access to full
procedural logic.
Overload every fact
with the first and last
locations within a
meaningful
sequence
Figure 7-12
Flight fact table
with dimensional
overloading
Often, the location of first and last events represent something even more interest-
ing than additional where dimensions; they represent why and how; for example,
the first web log entry for a visitor arriving at a website contains the URL previ-
ously clicked on—this is usually a search engine or banner ad. In which case it
represents why the visit took place and contains referral information, such as the
advertising partner or search string. Similarly, the last URL visited is significant
because it can describe the outcome of the visit—how it went. For example, if the
last URL is a purchase checkout confirmation page then the visit was a successful
sales transaction and each click leading up to the purchase can be labeled as such.
Because Timing-specific first and last locations are so significant they should be
attached to all the events in a sequence to help describe events more fully. Do this
by overloading the fact table with additional location foreign keys or brand new
why and how dimension keys.
Consequences
Adding useful dimensions from related events is another example of dimensional
overloading that requires extra ETL processing and additional fact table storage. In
this case, ETL must make multiple passes of the input data to read ahead, decide
which events are related and then go back and load the facts with this extra infor-
mation. However, this is well worth doing, so that common BI questions can be
answered without resorting to complex and inefficient SQL.
First and last events
often contain why
and how details that
describe the cause
and effect of all the
movements within a
sequence