My first Publication Agile-Data-Warehouse-Design-eBook | Page 190

170 Chapter 6 Mini-dimensions create additional fact table keys Banding causes some loss of dimensional detail but creates better attributes for report grouping and filtering Who and What Banding continuous valued attributes, such as INCOME or CREDIT SCORE, causes historical detail to be lost. Having said that, these high cardinality details do not make good dimensional attributes. Remember, the role of a dimensional is to provide good report row headers and filters. A high cardinality INCOME attribute is a poor report row header. Precise income figures (if you actually knew them) are virtually unique to customers and would group very little data, resulting in unread- able long reports—more like database dumps than BI reports. Low cardinality banded attributes, such as INCOME BAND make far better row headers, just so long as the bandings are carefully designed with the stakeholders. When stakeholders give you examples of high cardinality dimensional attributes you should ask them for the rollup descriptions or bands they would like to see on their reports. When they define their favorite numeric bandings make sure the bands are contiguous—no gaps and no overlaps. If stakeholders need access to the history of continuous valued customer details, you should model these not as dimensional attributes, but as facts in a customer profile fact table. Sensible Snowflaking Pattern Problem Large numbers of low cardinality attributes can waste space in very large dimensions Stakeholders want to use customer first purchase date in conjunction with com- mercially available geodemographic information to segment the customer base for marketing purposes. Adding all the necessary descriptive attributes related to first purchase date (first purchase quarter, first purchase on a holiday indicator etc,) and geodemographic code would greatly increase the size of an already very large customer dimension. Stakeholders are worried that existing queries, that don’t need the new marketing attributes, will be adversely affected. Solution “Snowflake” the CUSTOMER dimension. Move large collections of low cardinality attributes to outriggers Generally, it’s a good idea to denormalize as many descriptive attributes as possible into a dimension to simplify the model and improve query performance. But CUSTOMER dimensions, because of their size, are exceptional and can often benefit from sensible normalization or “snowflaking”. The FIRST PURCHASE DATE and GEODEMOGRAPHICS outriggers, shown in Figure 6-3, represent sensible snowflaking because they avoid a large number of much lower cardinality date and geodemographic attributes being embedded in the CUSTOMER dimen- sion. Keeping these attributes separate will make a worthwhile storage saving that will improve query performance—especially for all the queries that are not inter- ested in the first purchase dates or geodemographics. In this specific case the use of an outrigger for FIRST PURCHASE DATE makes even more sense as it can be implemented as a role-playing view of the standard CALENDAR dimension.