My first Publication Agile-Data-Warehouse-Design-eBook | Page 244
224
Chapter 7
When and Where
Time Dimensions or Time Facts?
Flight facts could
be overloaded with
actual departure
and arrival time
dimensions, but
would they be
useful dimensions?
If actual and
scheduled dates
vary by very little
there may be no
value in defining
the actuals as
dimensions
Actual timestamps
make good facts
that can be used to
calculate additional
delay and duration
facts
Fact tables can be
usefully overloaded
with facts calculated
using the next event
In addition to overloading the flight fact table with where dimensions, you might
consider overloading it with when dimensions too—documenting the actual
departure and actual arrival times of each flight, if these were available. This would
allow Pomegranate to measure the on-time performance of airlines. But should
these additional when details be modeled as dimensions? Would they provide
useful new ways of grouping the data in addition to the existing scheduled time
dimensions?
If stakeholders asked for flights to be summarized by ACTUAL ARRIVAL DATE
dimension rather than the SCHEDULED ARRIVAL DATE dimension, it would
make little difference to the answers they saw, unless many flights arrive a day (or
more) late. Even then comparing the two sets of dates dimensionally would pro-
duce skewed measures of airline performance; for example, a flight scheduled to
arrive at 23:59 on March 31 st could be only two minutes late but would be reported
as arriving in a different fiscal quarter. In contrast, a flight scheduled to arrive at
8:55 a.m. could be just over 15 hours late and still roll up to the same day, when
compared using ACTUAL ARRIVAL DATE and SCHEDULED ARRIVAL DATE.
It would appear that the actual arrival and departure dates separated from their
time of day components have no value as dimensions. Creating and indexing
additional foreign keys for them in the fact table would be a waste.
However the actual timestamps values themselves could be held in the fact table
because they are valuable for calculating delays that can used to measure airline
performance (perhaps filtering to ignore two-minute delays but looking for any-
thing over two hours). Better still, the FLIGHT DELAY could be calculated during
ETL and stored as an additive fact along with FLIGHT DURATION, as shown in
Figure 7-12. Both of these facts should be pre-calculated—rather than force the BI
users to perform the time arithmetic, especially because the timestamps involved
are in different time zones!
Figure 7-12 shows one more time-related fact called Layover Duration, which is the
time spent at the arrival airport (or city) before taking the next flight. This is an
example of fact overloading, again performed by ETL reading ahead and picking up
details from the next related event.
The actual departure and arrival dates do not make good additional time dimen-
sions in this particular example because they do not vary significantly from the
scheduled dates—they are usually the same date or one day later. For many other
business processes where actuals do vary significantly from targets or schedules,
actual dates would make very useful time dimensions indeed.