A data warehouse hosts historical transaction records and is used to help a company gain a better understanding of their business processes. A schema is a visual representation of the relationships between data in a database or data warehouse environment. A snowflake schema is a well-known structure used in data warehousing systems.

Key characteristics of a data warehouse include the following:

  • Data warehouses are sophisticated tools that help you analyze data and are therefore subject-oriented
  • Data warehouses integrate data from different sources by aligning the data in terms of factors like attribute names and units of measure
  • Data warehouse store historical data and is considered nonvolatile data.
  • Data warehouses are used to analyze changes over time to understand trends, patterns, and relationships between business entities
  • Data warehouse schemas are flattened for speed and simplicity
  • Data warehouses store a large amount of data
  • Both ad hoc and pre-defined data are essential for proper analysi

What Is a Snowflake Schema?

hand working on laptop

Image Source : freeimages

A snowflake schema is a standard data warehousing schema. To understand the schema, you first need to understand a few key concepts.

  • The hierarchy describes how the data is ordered and defines how the information is aggregated (or summarized)
  • You can use a hierarchical structure to determine how the data is aggregated
  • The level describes a position in a given hierarchy (for example yearly, quarterly, and monthly levels)
  • The level describes a position in a given hierarchy (for example yearly, quarterly, and monthly levels)

A snowflake schema mimics a transactional system in terms of normalization. Normalization is the process where you eliminate redundancy from your database by separating business entities and associating them with each other through relationships. Before diving into snowflake schemas, it’s important to understand some fundamentals of two major business database architectures: OLTP and OLAP.

Understanding OLTP and 

OLAP Database Environments

Snowflake Schema

Image Source : pexels

IT systems have what is known as a “back-end,” or some type of data repository that stores information from transactional or historical events.

Online Transaction Processing (OLTP) systems handle large amounts of transactions and focus on maintaining data integrity across all systems. OLTP systems are optimized for handling inserts, updates, and deletes.

On-line Analytical Processing (OLAP) manage low transactions. They collect and store historical transactions from OLTP systems in data warehouses. They are optimized to handle complex queries and help the business with strategic decisions based on data analysis

OLAP environments are synonymous with data warehousing. One significant difference between data warehousing systems and transactional systems is that data warehousing systems are not usually normalized to the third form. Here are some other differences:

  • Data warehouses are designed to handle both pre-defined and ad hoc queries while OLTP systems usually have pre-defined operations
  • A data warehouse is updated using regular bulk data modifications techniques while OLTP systems are always up to date
  • The level describes a position in a given hierarchy (for example yearly, quarterly, and monthly levels)
  • Data warehousing typically queries thousands of record while transactional operations access a few records at a time
  • Data warehouse stores large amounts of historical data for analysis while OLTP systems usually only store a relatively small amount of historical data

Data Warehousing

A data warehouse is a repository that holds an enterprise’s historical data organized in a way to help a company make business decisions. The data comes from different business unit applications. The data can come from sources such as marketing, sales, finance, customer relationship management systems, and accounting.

A data warehouse pulls the information from all types of data sources including spreadsheets and other databases such as external SQL Server databases. The data goes through formatting and import processes so it fits the data warehouse schema.

A data warehousing schema usually includes three table types: staging, user-maintained, and reporting tables. The data is put in staging tables to clean the data, adding it to the tables. A staging area is necessary to make sure the data from different systems align. The staging area simplifies the process.

The user maintained table contains data that does not go through the standard transformation process. It is usually data provided by users that exists nowhere else. Not all data starts “neatly” in a user-defined table.

You may find that some departments transform data outside the system, and therefore it is vital you define your schema structure so that all your data is “scrubbed.”

The reporting (or presentation) tables hosts most of the data the user queries. It is the clean data optimized for reporting.

Typical Data Warehousing Tasks

Regardless of the schema you use, there are common data warehousing tasks when building a data warehouse.

  • Understanding the business requirements and goals for your data warehouse
  • Configuring your data warehouse environment
  • Designing your database schema, including the creation and management of the schema objectsManaging systems
  • user and user roles
  • Managing system security
  • Creating reports
  • Monitoring the system performance, backups, and performances
  • Transforming the data
  • While you may have someone in charge of the data warehouse itself, data transformation may happen outside the system
  • Users may clean data in Excel spreadsheets, flat files, and small databases

The Star and Snowflake Schemas

iPad on the table

Image Source : pexels

The simplest data warehouse dimensional model is the star schema. As the name implies, it looks like a star. The star schema is the most basic dimensional model. It is made up of one fact table relationally joined to dimension tables around it.

The fact table is made up of attributes that describe an event and foreign keys from the surrounding dimension tables. Its primary key is made of the foreign keys of the dimension tables. The dimensions hold data about the business objects involved in the event or transaction. Since the Star Schema is flat and de-normalized, the query performance is great.

The snowflake schema extends the star schema. Instead of having a single table, each dimension table is normalized into multiple lookup tables. The process of normalizing these tables is called snowflaking.

The Difference between the Snowflake and

Star Schemas

Both the snowflake and star schemas are great data modeling models for data warehousing. They differ when it comes to normalization and query complexity.As mentioned, the snowflake schema is normalized while the star schema is flatter. The other difference is in the schema query. Snowflake schemas have multiple tables per dimension and therefore produces a more complex query than the star schema.

Is the Snowflake Schema a Good Schema Model?

Is snowflaking (the process of normalizing dimension tables in a star schema) a viable schema model? The simple answer is yes but just like with other business models, you have to think about the cost of your decision. Consider the type of information you need to know, the number of attributes per entity, the level of data integrity required, and the amount of storage space you have.

PROS

Advantages of the Snowflake Schema

  • The snowflake schema is normalized and therefore has a higher level of integrity. Normalization removes redundancy and chances of data anomalies. You can understand the data relationship by the way the data is modeled.
  • Another advantage of the snowflake schema is that it takes up less storage space. Each entity or business object lives in separate tables, making the relationship explicit instead of embedding the relationship in one table.

CONS

Disadvantages of the Snowflake Schema

  • While normalization is excellent for reducing redundancy and increasing data integrity, it comes at a price. One downside to normalization is an increase in query complexity. Querying a snowflake schema requires more joins than querying a star schema. Though there are disadvantages to the snowflake schema, there are reasons where you may want to consider using the schema.

When to Use a Snowflake Schema

Hand writing on notebook

Image Source : pexels

There are pros and cons to using the snowflake schema, so when is a good time to use it? You can snowflake when a business entity has a large set of attributes. For example, you have customers with multiple attributes in your consulting database. You may have your customer’s basic information, customer demographics, and customers psychographics.

You may want a snowflake schema in business situations that derive a many-to-many relationship type. You may have a customer that has different accounts or service types. For example, your customer may hire you for business coaching, website building, and website consulting. Normalizing the tables allow you to see the relationships more clearly.

If you are worried about disk space, snowflaking will save storage space. The cost for optimized storage space is more complicated queries, which may cost you more for performance.

Conclusion

Learning the interworkings of business data is not only interesting, but it is critical for understanding data schemas. Setting up a system that enables a business to dissect complicated information in a way that allows you to understand your business will help you optimize your sales and grow your business even further. The star schema is a viable data warehousing structure that allows you to save space, uphold data integrity, and optimize your business decisions.