Unit 4. -- Approaches to Working
with Dimensions
Slowly
Changing Dimensions
- Potential changes of dimensions over time
in Customer or Product
- PKs
do not change, but the description of the product
or customer may change:
- Customers are changing
address, income, marital status, credit
level, etc.
- Sales forces change the
names of districts and regions
TYPE ONE: Overwrite the old
value in the dimension record.
- Lose the ability to track the old
history.
- Use when the old value has no
significance
- Correct errors
TYPE TWO: Add an additional
dimension record using a new surrogate key value.
- Segment history.
- Track the change in an attribute
within a dimension.
- Represents a true
"physical" change to the dimension
entity.
- Need to use a surrogate key
- May use an effective begin date
and an end date
TYPE THREE: Add an
"old" field within the original
dimension to record the immediate previous attribute value
- Does not segment history.
- Used in "soft" or
"tentative" change as in redefinition
of product category boundaries.
- Used
when we wish to keep tracking history with both
old and new values.
- Also add Effective_Date field.
- Rarely used
Rapidly
Changing Small Dimensions
- Type 2 technique is still recommended
Large
Dimensions
- Two big dimensions with potentially
several million records: Customer and Product
- Issues:
- Support rapid browsing of the
unconstrained dimension
- Support efficient browsing of
cross-constrained values
- Query processing with the large
dimension
- Find and suppress duplicate
entries in the dimension
- Creating additional records to
handle the slowly changing dimensions
- Rapidly Changing Monster Dimensions as in
Customer
- Type 2 technique cannot be used
b/c of size
- Create a separate dimension with
frequently changing attributes
- Numerical data must be
converted into banded values
- Need to have all possible
discrete attribute combinations
- If each of 5
attributes has 10 possible
values, then 105
records.
- Also need a
surrogate demographic key
- Build every
possible demographics combination
at table creation time

- Disadvantages of separate dimensions
- Problem of converting into a
banded value
- Could end up with too many
combinations
- Create a second
demographic dimension
- Cross-browsing between customer
and demographic gets slow
- Can associate demographic data
only when there is an event
- Add a demographic key to
the customer dimension
Degenerate Dimensions
- When the grain of a fact table is
line-item
- Order number is a degenerated
dimension.
- The PK of the fact table consists
of all FKs + degenerated_dimension + line_item.
- Occurs in Order, Invoice, Bill of lading,
ticket
- The degenerated dimension key should be
the actual production order number.
- The schema with a degenerated dimension is
called a multi-star schema.
Junk
Dimensions
- A technique for handling left-over
attributes from the fact table
- An artificial dimension that contains:
- A group of random flags taken from
a fact table
- 10 flags: 210 =
1024 records
- An open-ended comment field
attached to a fact record
- Will contain all the
distinct comments
- Needs "No
Comment" record in the junk
dimension
Foreign
Keys, Primary Keys, and Surrogate Keys
- All dimensions must have single attribute
meaningless surrogate keys.
- Do not use the original production
keys.
- A 4-byte integer can support 2
billion positive integers.
- Avoid smart keys or intelligent
keys.
- DATE Keys
- Also use a surrogate key
- A SQL DATE is usually 8 bytes (7
bytes in Oracle)
- It allows us to represent:
- I dont know
- It hasnt happened
yet
- Not applicable
- Time stamp should be a separate
field from the Date field.
Additivity of Measures
- Additivity: add the facts along all
of the dimensions
- Numerical measures of intensity are not
always additive.
- Ex: Account balance, inventory
level
- They are additive along all the
dimensions except time.
1.
Fully-additive:
- Can add
across any dimensions
- Ex:
sales_dollars, cost, qty_sold, margin_dollars
2.
Semi-additive:
- Can add
across some dimensions, but not all
- Semi-additivity
occurs when measures represent levels.
- Semiadditive
facts must be averaged across time when used
across time.
- Ex:
bank_balance, head_counts, invenstory_quantities
. All
balances can be added over all accounts for each branch.
. All
balances can be added for all accounts for each customer.
. Balances
cannot be added over time; we have to divide the sum by
#days.
3.
Non-additive:
- Cant
add across dimensions
- Percentages
(or rates) are non-additive.
- Temperatures
are non-additive.
- Ex:
margin_rate
- Break
down into additive components
(Keep
Margin_dollars and Sales_Dollars, instead of Margin_rate)