My first Publication Agile-Data-Warehouse-Design-eBook | Page 27

6 Chapter 1 Advantages of ER Modeling for OLTP 3NF is efficient for transaction processing Normalized databases with few, if any, data redundancies have one huge advantage for OLTP: they make write transactions (inserts, updates, and deletes) very effi- cient. By removing data redundancies, transactions are kept as small and simple as possible. For example, the repeat usage of a service by a telecom’s customer is recorded using tiny references to the customer and service: no unnecessary details are rerecorded each time. When a customer or service detail changes (typically) only a single row in a single table needs to be updated. This helps avoid update anomalies that would otherwise leave a database in an inconsistent state. Higher forms of normalization are available, but most ER modelers are satisfied when their models are in 3NF. There is even a mnemonic to remind everyone that data in 3NF depends on “The key, the whole key, and nothing but the key, so help me Codd”—in memory of Edgar (Ted) Codd, inventor of the relational model. Disadvantages of ER Modeling for Data Warehousing 3NF is inefficient for query processing 3NF models are difficult to understand History further complicates 3NF Even though 3NF makes it easier to get data in, it has a huge disadvantage for BI and data warehousing: it makes it harder to get the data out. Normalization prolif- erates tables and join paths making queries (SQL selects) less efficient and harder to code correctly. For example, looking at the Figure 1-1 ERD, could you estimate how many ways PRODUCT CATEGORY can be joined to ORDER TRANSACTION? A physical 3NF version of the model would contain at least 20 more tables to resolve the M:M relationships. Faced with such 3NF databases, even the simplest BI query requires multiple tables to be joined through multiple inter- mediate tables. These long joins paths are difficult to optimize and queries invaria- bly run slowly. More importantly, queries will only produce the right answers if users navigate the right join paths, i.e., ask the right questions in SQL terms. If the wrong joins are used, they unknowingly get answers to some other (potentially meaningless) questions. 3NF models are complex for both people and machines. Specialist hardware (data warehouse appliances) is improving query/join performance all the time, but the human problems are far more difficult to solve. Smart BI software can hide database schema complexity behind a semantic layer, but that merely moves the burden of understanding a 3NF model from BI users at query time to BI developers at configuration time. That’s a good move but its not enough. 3NF models remain too complex for business stakeholders to review and quality assure (QA). ER models are further complicated by data warehousing requirements to track history in full to support valid ‘like-for-like’ comparisons over time. Providing a true historical perspective of business events requires that many otherwise simple descriptive attributes become time relationships, i.e., existing M:1 relationships become M:M relationships that translate into even more physical tables and com-