Unit 6 -- Advanced Dimensional Modeling Techniques

Lesson 1 of 3
Many-to-Many Dimensions
- Many-to-many Dimensions
- Health care example (see following diagram)
- Diagnosis : Billing = M : N
- A patient has many diagnoses

- Problems of M:N relationships
- When retrieving measures from fact table from a dimension, we need a Weighting
Factor
- Otherwise, incorrect result when more than one dimension factor is in effect
- Looking for rows where a particular combination of diagnosis was related requires
multiple correlated subqueries
- report generation is much more complex
- Need more join
- Problems of Denormalized dimensions using enumerated-value attributes
- Adding a new value requires a DDL to add a column
- Kimball's approach: use Bridge table:

- Diagnosis_group_key, Diagnosis_key, Weighting_factor
- Within each diagnosis group: Sum(Weighting_factor) = 1
- Two types of reports:
- Weighted summary of all charges
- Impact report that does not use the weighting factor
- When a patient will have different diagnosis groups over a period of time:
- Add patient_key, begin_date, end_date
- Why do we need patient_key?
- The PK could be Diagnosis_group_key + Diagnosis_key + Begin_date
Many-to-One-to-Many Traps
- Order and Return Fact tables
- Case when simultaneously joining a single dimension to two fact tables of different
cardinality
- Use multi-pass SQL separately
- Query the Order facts
- Query the Return facts
- Outerjoin the two answer sets on the customer row header
Role-Playing Dimensions
- Multiple rows: a single dimension appears several times in the same fact table
- Example 1: Accumulating Snapshot Fact table:
- Order_date, Package_date, Shipping_date, Payment_date, Customer, Product, Warehouse,
Promotion
- Each date is a separate dimension
- We cannot join all these foreign keys to the same fact table: the time dimension
- SQL will interpret multi-way simultaneous join requiring all the dates to be the same
- Create a separate view for each time with all unique field names
- Example 2: Frequent-flyer flight segment fact table
- Segment_origin_airport, Segment_dest_airport, Trip_origin_airport, Trip_dest_airport
- Example 3: Telecommunication industry
- Source system provider, Local switch provider, Long-distance provider, Value-added
service provider
Organization and Parts
Hierarchies
- A dimension with hierarchical relationship
- Oracle's CONNECT BY cannot be used with a JOIN
- Use Bridge table between the Customer and Fact table

- The bridge contains one record for each pathway, even zero-length pathway from a
customer to itself
- Number_of_level: level of depth
- Bottom_most_flag: the leaf of the tree
- The use of Bridge table is optional
- Customer table can still be directly joined to the Fact table
- Parts explosion
Time Stamping the Changes in a Large Dimension
- Requirements in HR data: three types of queries
- Report summary status of the entire employees on a regular monthly basis
- Report for all possible slices of the data, including time slices, organization slices,
geographic slices
- To slice your data along a particular attribute, make the attribute appear in a
dimension
- Profile the employee population at any precise instant in time
- Historical views including transaction sequence and timing
- How can we design satifying the requirement?
- A single dimensional schema with one fact table and Employee Transaction dimension
- Study the following dimensional model:

- Employee Transaction Dimension: a complete snapshot of the employee record for
each transaction
- Use a surrogate key
- The semantic key: employee_ID + Transaction_data_time
- Note the use of Transaction_end_date_time:
- Equal to the next Transaction_date_time
- These two times define a span of time during which employee desc is exactly correct
- Note the use of Last_transaction_flag: identifies the most up-to-date status quickly!
- Fact table
- All, except balance, are additive
- Testing the three requirements:
- Query 1: Rapidly report summary status of the entire employee base on a monthly basis
- Query 2: Profile the employee population at any precise instant in time, e.g. how many
employees did we have at the end of 1996? AND
- Query 3: Distinctly represent every employee transaction in correct transaction sequence
- Use employee_transaction_dimension
- Could return exactly one record for each employee at the given moment
Building an Audit
Dimension
- Records the lineage of each fact record
- Constructed during the extract process
- One record for each separate run of SW during the extract process
- If a fact table is subject to updating, a M:N bridge table is needed
Too Few Dimensions and Too Many
Dimensions
- Most dimensional schema have 5-15 dimensions
- If too few, consider
- Casual dimensions (promotion, contract, deal, store)
- Time-Stamp dimensions (when the grain is line item)
- Multiple roles
- Status dimensions (status, snapshots)
- Audit dimension
- Degenerate dimension
- Junk dimension
- If too many, check several dimensions
- are not independent
- could be combined together

