Intro to OLAP: On-Line Analytical Processing
We briefly tasted OLAP in the previous section, DW Overview. So what is exactly OLAP? On-Line Analytical Processing (OLAP) is a decision support tool that allows users to analyze different dimensions of multidimensional data. Designed for executives looking to make sense out of their information, OLAP structures data hierarchically to reflect the real dimensionality of the enterprise as understood by the users. Users can pivot, filter, drill down and drill up data and generate numbers of views with simple mouse manipulations. It is commonly used in business reporting for sales, marketing, management, data mining, and similar areas. The OLAP operations will be covered more in detail in the other section, OLAP Operations.
An OLAP structure created from the operational data is called an OLAP cube. As Figure 1 shows, the cube holds data more like a 3D spreadsheet rather than a relational database, allowing different views of the data to be quickly displayed. In multidimensional OLAP (MOLAP) databases, cubes are created and stored physically, whereas in relational OLAP (ROLAP) databases, cubes are virtually created, based on a star or snowflake schema. Don't understand how MOLAP and ROLAP works? We will cover this topic in detail in the other sections under OLAP Architectures. For now, just understand that these are two different types of OLAP.
Figure 1.
OLAP cube with
three dimensions |
|
Let's briefly touch on the differences between star and snowflake schemas. As shown in Figure 2, the star schema is used by most data warehouses to represent the multidimensional data model. The star schema is the simplest data warehouse schema, consisting of a single fact table and a set of dimension tables. The fact table is at the center. Each tuple in the fact table consists of a pointer to each of the dimensions that provides its multidimensional coordinates, and stores the numeric measures for those coordinates. Numerous dimension tables are linked to the fact tables. Each dimension table consists of columns that correspond to attributes of the dimension.
These tables indicate how the aggregations of relational data can be analyzed. An example would be to compute the total sales of a particular product by each state for particular year.
Figure 2. Typical star schema
|
|
In comparison to the star schema, the snowflake schema is a more complex data warehouse model. The snowflake schema normalizes dimensions to eliminate redundancy. The dimension data has been grouped into multiple tables instead of one large table. While this schema saves more space, it increases the number of dimension tables and requires more foreign key joins, resulting in more complex queries and reduced query performance. A graphic illustration is shown in Figure 3.
Figure 3. Typical snowflake schema
|
|
|