My first Publication Agile-Data-Warehouse-Design-eBook | Page 243

Dimensional Design Patterns for Time and Location 223 Solution The FLIGHT FACT table, shown in Figure 7-12, has been modified to contain two extra airport foreign keys, representing the journey origin and journey destination locations not found in the original EMPLOYEE FLIGHT event details. With these additional AIRPORT roles, it suddenly becomes trivial to answer questions about where frequent flyer employees are located (Journey Origin) and where they really have to go to (Journey Destination). These incredibly useful first and last locations are hidden amongst all the flight information but can be found by applying a time- based business rule: “all flights taken by the same employee, no more than four hours apart, are legs of the same journey”. This test would be difficult for BI tools using non-procedural SQL but relatively simple for ETL processes with access to full procedural logic. Overload every fact with the first and last locations within a meaningful sequence Figure 7-12 Flight fact table with dimensional overloading Often, the location of first and last events represent something even more interest- ing than additional where dimensions; they represent why and how; for example, the first web log entry for a visitor arriving at a website contains the URL previ- ously clicked on—this is usually a search engine or banner ad. In which case it represents why the visit took place and contains referral information, such as the advertising partner or search string. Similarly, the last URL visited is significant because it can describe the outcome of the visit—how it went. For example, if the last URL is a purchase checkout confirmation page then the visit was a successful sales transaction and each click leading up to the purchase can be labeled as such. Because Timing-specific first and last locations are so significant they should be attached to all the events in a sequence to help describe events more fully. Do this by overloading the fact table with additional location foreign keys or brand new why and how dimension keys. Consequences Adding useful dimensions from related events is another example of dimensional overloading that requires extra ETL processing and additional fact table storage. In this case, ETL must make multiple passes of the input data to read ahead, decide which events are related and then go back and load the facts with this extra infor- mation. However, this is well worth doing, so that common BI questions can be answered without resorting to complex and inefficient SQL. First and last events often contain why and how details that describe the cause and effect of all the movements within a sequence