My first Publication Agile-Data-Warehouse-Design-eBook | Page 133
112
Chapter 4
Generalization Agile data warehouse modelers must use generalization carefully. Data models that
produces data value flexibility over simplicity are notoriously difficult to understand and use for BI.
models that are They can work for transactional software products because their data structures are
difficult for BI users completely hidden from the users by application interfaces. But “universal data
to understand and models” that rely on high levels of generalization or abstraction do not work so well
query for BI users who—despite the semantic layers provided by BI tools—need far
simpler data warehouse designs to be able to construct and run ad-hoc queries
efficiently.
Modelstorming data One of the great benefits of modelstorming is that stakeholders feel a sense of
requirements ownership in the resulting design. If they have abstractions forced upon them they
specifically rather start to lose that feeling: it’s no longer their model, their data—it could be anyone’s.
than generally The only Party Roles most stakeholders recognize are Host, Guest, or Gate-
promotes stake- crasher—or maybe political ones if that’s their specialist field. In extreme cases
holder design where generalization is taken too far, to the point where the data model can be used
ownership to represent almost anything, it will actually mean nothing to stakeholders. This
defeats the goal of modelstorming, which is not to design data structures that merely
store data but to design ones that stakeholders will use and cherish. Modeling each
interesting who, what, when, where, why and how as specifically as possible helps
to promote the data model understanding needed to construct meaningful queries
and interpret their results.
Postpone ‘technical Stakeholders are happy with “reasonable” levels of generalization if they can see an
benefit only’ obvious business benefit such as a better understanding of the commonalities
generalization until (conformance) between business processes that improves analysis. But if the
star schema design benefits are purely technical—to cut down database administration or streamline
ETL—then you should postpone generalization until you design your star schemas
and ETL processes.
Discovering Process Sequences
Conformed why and
how dimensions
often indicate a
process sequence
The last two Ws, why and how, are grouped together on the matrix because of their
similarities and close relationships within processes. Whys and hows are the most
common types of non-conformed dimension but when they are conformed they
can often change type, from how to why and vice versa. This happens when events
have a cause and effect relationship that often represents a process sequence. You
discover just such a sequence if you ask:
Why does a warehouse worker ship a product?
and get the answer:
Because a customer ordered the product.