Step 1: Select the business Process : we select POS transactions
Step 2: Declare the grain
Capture the grain in the facts at the most detail level. Such most detailed data is called ATOMIC DATA.
Atomic data provides maximum analytic flexibility because it can be constrained and rolled up in every possible way. Detailed data in a dimensional model is poised and ready for the ad hoc attack by business users.
Step 3: Choose the dimensions
Step 4: Identify the facts.
Question: Should a calculated fact be stored in the database?
A) Yes as application logic at different places using the facts can introduce errors.
Better to have one source of calculation , therefore no chance of user error.
Also why should the reporting tool do calculations if can be avoided.
Yes Views can be an alternate solution but in that case users should not be given access to underlying tables in any circumstance.
Dimensions :
----------
Date
Product
POS Transaction Fact
Store
Promotion
Dimensions should have descriptive column values like
Holiday/NonHoliday instead of Y/N
Weekend/Weekday instead of Y/N flags
selling season column can have values like "Valentine's day/Christmas/Thanksgiving" etc
In a report dimension attributes become report headers.
eg total sales by Month by Weekday
Descriptive values help in slicing/dicing data rather than put another decode logic in
reports say weekend or non weekend based on Y/N flag.
Q) Do we need an explicit date dimension. Cant we do with actual date in the fact table instead
of a surrogate date key from date dimension
A) YES
reasons
1) relation databases can easily handle joins, optimizer will do it efficiently based on PK index defined.
2) Most databases will not index date calculations, so queries constraining on date calculations will be slower as will not take advantage of index.
3) Business users are not versed with SQL date semantics like how to determine weekend/weekday
given a date.
4) calendar logic belongs in the dimension table and not in the application code
5) SQL based date key (actual date) is typically 8 bytes so wasting 4 bytes (integer) for every date key if date is used in the facts.
in each row of the fact table.
Better to have a separate time dimension and not combine with date dimension.
Subscribe to:
Post Comments (Atom)
Followers
About Me
- reiki
- Torrance, CA, United States
No comments:
Post a Comment