When does it make sense to use a Snowflake Schema vs. Star Schema in database design?
-
What are some practical real-life examples?
-
Answer:
A star schema is used as a basic implementation of an OLAP cube. If your fact table contains a 1 to many relationship to each of your dimensions in your data warehouse schema then it is appropriate to use a star schema. Although if your fact table has a many to many relationship with its dimensions (i.e. many rows in your fact equal many rows in your dimension) then you must resolve this using a snow flake schema where the bridge table contains a unique key to each row in the fact table. An example of a 1 to many relationship (star schema) is a fact table which contains sales data, and a dimension table which contains a list of stores. 1 store can have many sales but each sale only comes from 1 store. i.e. 1 row in the dimension table can equal many rows in the fact table. To modify the above example to make it a snow flake schema would be as follows: a store can have many sales but each sale can come from many stores. This would be a many to many relationship and you would need a bridge table to implement this functional requirement. Many data warehouse guides (including Kimball's Data Warehouse Tool kit) recommend limiting the implementation of a snow flake schema. Ref: http://www.1keydata.com/datawarehousing/concepts.html
John Cook at Quora Visit the source
Other answers
Snowflakes can reduce duplicate/repeated attributes & add some normalization in Star Schema. One quick example from the book "Star Schema - Complete Reference"[1]: A product table might have brand, brand code, brand manager; but as there will only a few brands, no need to duplicate the brand attributes; in which case the brand can be a snowflake ... Ref: [1] http://www.amazon.com/Schema-Complete-Reference-Christopher-Adamson/dp/0071744320/
Krishna Sankar
I would only use a snowflake when I am extremely limited in the memory available to me, which means when I am working in a prior generation RDBMS on a 32bit platform. I would never design such a thing, but if I inherited such a beast I might leave it intact. My preference would be to use a 64bit operating system with a new columnar store like (ironically) Snowflake Computing or Vertica or Redshift. In these cases, I would use wide denormalized fact tables with very little or no performance penalty. Snowflake ( http://www.snowflake.net/e) Vertica (http://www.vertica.com/) Amazon Redshift (http://aws.amazon.com/redshift/ )
Michael David Cobb Bowen
I would like to add to many of the interesting posts, Snowflakes can be helpful where analytic apps want users to consume data in a "drill down fashion" fashion. Aka Cube, hierarchy etc.. Date dimensions are easy to understand. Years -> QTR -> Month -> Week -> Day -> Time Other Cubes might be organizational Global Org -> Regional Org -> Division -> Local Product Data often has a lot of Drill down options Prod Cat -> Product Group -> Item Size Etc.
Andrew Hansen
In snowflake schema, you further normalize the dimensions. Ex: a typical Date Dim in a star schema can further be normalized by storing Quarter Dim, Year dim in separate dimensions. Snowflake schema is generally used if: 1) You have a requirement where you don't need to frequently query a certain set of dimension data but still need it for information purposes. By storing this data in a separate dimension, you are reducing redundancy in main dimensions. 2) You have a reporting or cube architecture that needs hierarchies or slicing feature. 3) You have fact tables that have different level of granularity. Ex: You have sales fact table where you are tracking sales at product level. Then you also have budget fact table where you are tracking budgeting by product category. Here is a short video which I think is easy to follow. It is, however, not recommended because it increases the joins and complexity of your query and hence slows down the performance. PS: Bridge tables are not snowflake but bridge tables. The purpose of bridge tables are to resolve m:m relationship. A snowflake dimension would have further (or leaf level) information of the parent dimension stored for usability and storage.
Anonymous
Snowflake is a further normalization of Star schema. You would use it to prevent repetition. An example where this is used is the location dimension. So in a sales data warehouse for example, you might have the following dimensions: User Account Lead Location. but... A user, Account or Lead could have its own location. So instead of repeating location in each of these, you would create a foreign key from each of those dimensions to the Location dimension. An ERD showing this relationship would begin to look like a snowflake. IMO, snowflake is a good practice for a data warehouse in an RDBMS, but not so much in an OLAP database. Feeding an OLAP database de-normalized data is a better practice.
David Badenchini
Related Q & A:
- How to talk to a company as a prospective contractor vs. employee?Best solution by Freelancing
- Does it ever make sense to use RAM Disk to force RAM allocation for tempdb with SQL Server 2008?Best solution by Database Administrators
- How do I change a password directly in the PhpBB3 SQL database?Best solution by Server Fault
- A tax professional vs Tax Software?Best solution by Yahoo! Answers
- How can I use a HDMI when my TV/Monitor doesn't have internal speakers?Best solution by Yahoo! Answers
Just Added Q & A:
- How many active mobile subscribers are there in China?Best solution by Quora
- How to find the right vacation?Best solution by bookit.com
- How To Make Your Own Primer?Best solution by thekrazycouponlady.com
- How do you get the domain & range?Best solution by ChaCha
- How do you open pop up blockers?Best solution by Yahoo! Answers
For every problem there is a solution! Proved by Solucija.
-
Got an issue and looking for advice?
-
Ask Solucija to search every corner of the Web for help.
-
Get workable solutions and helpful tips in a moment.
Just ask Solucija about an issue you face and immediately get a list of ready solutions, answers and tips from other Internet users. We always provide the most suitable and complete answer to your question at the top, along with a few good alternatives below.