My first Publication Agile-Data-Warehouse-Design-eBook | Page 27
6
Chapter 1
Advantages of ER Modeling for OLTP
3NF is efficient for
transaction
processing
Normalized databases with few, if any, data redundancies have one huge advantage
for OLTP: they make write transactions (inserts, updates, and deletes) very effi-
cient. By removing data redundancies, transactions are kept as small and simple as
possible. For example, the repeat usage of a service by a telecom’s customer is
recorded using tiny references to the customer and service: no unnecessary details
are rerecorded each time. When a customer or service detail changes (typically)
only a single row in a single table needs to be updated. This helps avoid update
anomalies that would otherwise leave a database in an inconsistent state.
Higher forms of normalization are available, but most ER modelers are satisfied
when their models are in 3NF. There is even a mnemonic to remind everyone that
data in 3NF depends on “The key, the whole key, and nothing but the key, so help
me Codd”—in memory of Edgar (Ted) Codd, inventor of the relational model.
Disadvantages of ER Modeling for Data Warehousing
3NF is inefficient for
query processing
3NF models are
difficult to
understand
History further
complicates 3NF
Even though 3NF makes it easier to get data in, it has a huge disadvantage for BI
and data warehousing: it makes it harder to get the data out. Normalization prolif-
erates tables and join paths making queries (SQL selects) less efficient and harder
to code correctly. For example, looking at the Figure 1-1 ERD, could you estimate
how many ways PRODUCT CATEGORY can be joined to ORDER
TRANSACTION? A physical 3NF version of the model would contain at least 20
more tables to resolve the M:M relationships. Faced with such 3NF databases, even
the simplest BI query requires multiple tables to be joined through multiple inter-
mediate tables. These long joins paths are difficult to optimize and queries invaria-
bly run slowly.
More importantly, queries will only produce the right answers if users navigate the
right join paths, i.e., ask the right questions in SQL terms. If the wrong joins are
used, they unknowingly get answers to some other (potentially meaningless)
questions. 3NF models are complex for both people and machines. Specialist
hardware (data warehouse appliances) is improving query/join performance all the
time, but the human problems are far more difficult to solve. Smart BI software can
hide database schema complexity behind a semantic layer, but that merely moves
the burden of understanding a 3NF model from BI users at query time to BI
developers at configuration time. That’s a good move but its not enough. 3NF
models remain too complex for business stakeholders to review and quality assure
(QA).
ER models are further complicated by data warehousing requirements to track
history in full to support valid ‘like-for-like’ comparisons over time. Providing a
true historical perspective of business events requires that many otherwise simple
descriptive attributes become time relationships, i.e., existing M:1 relationships
become M:M relationships that translate into even more physical tables and com-