My first Publication Agile-Data-Warehouse-Design-eBook | Page 166

Modeling Star Schemas 145 Adding surrogate keys to dimensions in addition to their business keys slightly increases the size of the dimensions, but this is insignificant. In general do not worry about the size of dimensions. Although dimension rows can be long with dozens of descriptive attributes, dimensions typically only contain hundreds or thousands of rows in total, whereas fact tables can record millions of rows per day. If you want to see where you should concentrate on saving storage take a look at the Figure 5-8 “scale” diagram of a star schema. In a dimensional data warehouse, fact tables along with their indexes and their staging tables account for 80%–90% of the storage requirements. Don’t worry about Of course not every dimension is small. Customer dimensions that contain indi- vidual consumers can contain tens or hundreds of millions of rows. These need to be treated with the same respect as fact tables and will benefit too from a primary key index based on a compact 4-byte integer rather than a longer “smarter’ alpha- numeric Customer ID that contains a check digit. Chapter 6 covers specific tech- niques for handling very large dimensions (VLDs), also known as “monster dimensions”. …except customer the size of dimension tables… dimensions – they can be big! Figure 5-8 Scale diagram of a star schema by space used For fixed length surrogate keys, a 4-byte integer is suitable for most dimension populations. If you are expecting a crowd (more than 2.1 billion whos or whats) or have specialized calendar dimension requirements (discussed in Chapter 7), you should use an 8-byte integer surrogate key. Improve Query Performance Because integer surrogate keys reduce the size of fact tables and indexes, they help to fit more records into each read operation. This in turn leads to improved star join processing and dramatic improvements in query performance. Enforce Referential Integrity Efficiently Surrogate keys do add an extra layer of complexity to ETL. It is true that fact records can be loaded quicker if business keys are not replaced by surrogates. But this overhead is more than outweighed by the benefits listed above and a byproduct of surrogate keys is efficient referential integrity (RI). Integer SKs join faster than date and character keys SK lookups slow down fact loads but enforce referential integrity (RI)