My first Publication Agile-Data-Warehouse-Design-eBook | Page 300
280
Chapter 9
Why and How
Solution
A step dimension
numbers each event
in a sequence
The humble looking STEP dimension, in Figure 9-17, helps BI users understand
sequential behavior. It allows ETL processes to explicitly label events with their
position in a sequence (from its beginning and from its end), along with the length
of the sequence. For example, a web browsing session of four page views by the
same visitor (IP address) within an agreed timeframe would be represented as four
rows in a PAGE VIEWS FACT table. The first page view event would be labeled as
step 1 of 4 by assigning it a STEP KEY of 7 (see Figure 9-17). The next page view
would be labeled as step 2 of 4 using STEP KEY 8, and so on.
Figure 9-17
STEP dimension
A STEP dimension
enables positional
analysis (better
story telling) using
simple single-pass
queries
Step dimensions
can play multiple
roles to describe
sequences within
sequences
Figure 9-18
Using the STEP
dimension to
describe web page
visits
BI users can use the STEP dimension to easily identify page views belonging to
sessions of any length, rank pages by position within sessions, and answer ques-
tions about the beginning, midpoint and ending of sessions for any interesting
subset of customers, time, and products. They can quickly find the good and bad
(“session killer”) last page visits of a session (LAST STEP = “Y”), or those that
precede session killers (STEPS UNTIL LAST = 1) using simple, single-pass SQL.
Answering questions like these without a STEP dimension would be too difficult
for all but the most SQL-savvy BI users.
A STEP dimension can also play multiple roles for an event; for example, Figure 9-
18 shows a PAGE VIEWS FACT table with two STEP dimension roles: STEP IN
SESSION which describes page position within the overall session, and STEP IN
PURCHASE which describes how close each page is to a purchase decision. Each
time a visitor clicks on a link to place a product in a shopping cart, STEP IN
PURCHASE would be reset and the next mini-sequence length calculated.