Know the Reasons Why Snowflake Schema Is Best for You
A data warehouse hosts historical transaction records and is used to help a company gain a better understanding of their business processes. A schema is a visual representation of the relationships between data in a database or data warehouse environment. A snowflake schema is a well-known structure used in data warehousing systems.
Key characteristics of a data warehouse include the following:
What Is a Snowflake Schema?
A snowflake schema is a standard data warehousing schema. To understand the schema, you first need to understand a few key concepts.
A snowflake schema mimics a transactional system in terms of normalization. Normalization is the process where you eliminate redundancy from your database by separating business entities and associating them with each other through relationships. Before diving into snowflake schemas, it's important to understand some fundamentals of two major business database architectures: OLTP and OLAP.
Understanding OLTP and
OLAP Database Environments
IT systems have what is known as a "back-end," or some type of data repository that stores information from transactional or historical events.
Online Transaction Processing (OLTP) systems handle large amounts of transactions and focus on maintaining data integrity across all systems. OLTP systems are optimized for handling inserts, updates, and deletes.
On-line Analytical Processing (OLAP) manage low transactions. They collect and store historical transactions from OLTP systems in data warehouses. They are optimized to handle complex queries and help the business with strategic decisions based on data analysis
OLAP environments are synonymous with data warehousing. One significant difference between data warehousing systems and transactional systems is that data warehousing systems are not usually normalized to the third form. Here are some other differences:
A data warehouse is a repository that holds an enterprise's historical data organized in a way to help a company make business decisions. The data comes from different business unit applications. The data can come from sources such as marketing, sales, finance, customer relationship management systems, and accounting.
A data warehouse pulls the information from all types of data sources including spreadsheets and other databases such as external SQL Server databases. The data goes through formatting and import processes so it fits the data warehouse schema.
A data warehousing schema usually includes three table types: staging, user-maintained, and reporting tables. The data is put in staging tables to clean the data, adding it to the tables. A staging area is necessary to make sure the data from different systems align. The staging area simplifies the process.
The user maintained table contains data that does not go through the standard transformation process. It is usually data provided by users that exists nowhere else. Not all data starts "neatly" in a user-defined table.
You may find that some departments transform data outside the system, and therefore it is vital you define your schema structure so that all your data is "scrubbed."
The reporting (or presentation) tables hosts most of the data the user queries. It is the clean data optimized for reporting.
Typical Data Warehousing Tasks
Regardless of the schema you use, there are common data warehousing tasks when building a data warehouse.
The Star and Snowflake Schemas
The simplest data warehouse dimensional model is the star schema. As the name implies, it looks like a star. The star schema is the most basic dimensional model. It is made up of one fact table relationally joined to dimension tables around it.
The fact table is made up of attributes that describe an event and foreign keys from the surrounding dimension tables. Its primary key is made of the foreign keys of the dimension tables. The dimensions hold data about the business objects involved in the event or transaction. Since the Star Schema is flat and de-normalized, the query performance is great.
The snowflake schema extends the star schema. Instead of having a single table, each dimension table is normalized into multiple lookup tables. The process of normalizing these tables is called snowflaking.
The Difference between the Snowflake and
Both the snowflake and star schemas are great data modeling models for data warehousing. They differ when it comes to normalization and query complexity.
As mentioned, the snowflake schema is normalized while the star schema is flatter. The other difference is in the schema query. Snowflake schemas have multiple tables per dimension and therefore produces a more complex query than the star schema.
Is the Snowflake Schema a Good Schema Model?
Is snowflaking (the process of normalizing dimension tables in a star schema) a viable schema model? The simple answer is yes but just like with other business models, you have to think about the cost of your decision. Consider the type of information you need to know, the number of attributes per entity, the level of data integrity required, and the amount of storage space you have.
Advantages of the Snowflake Schema
Disadvantages of the Snowflake Schema
When to Use a Snowflake Schema
There are pros and cons to using the snowflake schema, so when is a good time to use it? You can snowflake when a business entity has a large set of attributes. For example, you have customers with multiple attributes in your consulting database. You may have your customer's basic information, customer demographics, and customers psychographics.
You may want a snowflake schema in business situations that derive a many-to-many relationship type. You may have a customer that has different accounts or service types. For example, your customer may hire you for business coaching, website building, and website consulting. Normalizing the tables allow you to see the relationships more clearly.
If you are worried about disk space, snowflaking will save storage space. The cost for optimized storage space is more complicated queries, which may cost you more for performance.
Learning the interworkings of business data is not only interesting, but it is critical for understanding data schemas. Setting up a system that enables a business to dissect complicated information in a way that allows you to understand your business will help you optimize your sales and grow your business even further. The star schema is a viable data warehousing structure that allows you to save space, uphold data integrity, and optimize your business decisions.