Web-enabled OLAP Tutorial (WOLAPT)

TABLE OF CONTENTS

- DW Overview

--------Architecture
--------Back-end Tools

- Intro to OLAP

--------Codd's 12 Rules
--------FASMI

- MD Data Structures

- OLAP Server

- OLAP Operations

- OLAP Architectures

--------MOLAP: Part I
--------MOLAP: Part II
--------ROLAP: Part I
--------ROLAP: Part II
--------HOLAP
--------DOLAP

- Data Explosion

- OLAP Criteria

- Glossary

- References

OLAP Architectures

Regardless of the different types, all OLAP architectures involve building a multidimensional data structure, where dimensions represent business entities such as sales regions and products or natural entities such as time and geography. It is this data structure, called a cube, that users analyze via an OLAP technology. Two issues regarding the multidimensional data structure distinguish OLAP architectures:

When and how is the multidimensional data structure constructed?
Some OLAP technologies require an ETL (extract, transform, and load) process, which typically runs at off-peak usage times to build and update a persistent multidimensional data structure. On the other hand, other OLAP technologies access source data directly to build and present multidimensional data on the fly as the user performs analysis.
Where does the multidimensional data structure live and how persistent is it?
The multidimensional data structure may reside in a persistent, dedicated multidimensional database, a hypercube temporarily cached in memory, or a star schema/snowflake schema stored in a relational database.

The different OLAP architectures, MOLAP, ROLAP, HOLAP, and DOLAP, will be fully explained in the next sections.

Data Staging

Data in OLAP systems mostly originate from other systems. It is necessary for the active data to be stored in a separate form, data warehouse or data mart, for the OLAP system. Below are the main reasons for implementing a separate data warehouse instead of using the existing operational databases.

Performance

Performance can suffer when executing complex OLAP queries against the operational databases such as OLTPs. Data should be kept in a separate, optimized structure where users can rapidly access, without damaging the response from the operational systems.

Multiple Data Sources

Most OLAP systems require data sourced from various operational systems, possibly including external sources or desktop applications. Since the base systems may use different coding systems or have different periodicity, the process of merging these heterogeneous sources can be very complex.

Cleansing/Adjusting Data

OLTP systems may contain erroneous data which needs to be cleansed before it is in the stage for analysis. Also the different sources may contain data of varying quality or inconsistent representations or formats. In order to adjust data without affecting the OLTP systems, data should be kept separate.

Timing

Since most OLAP systems require data sourced from various operational systems, it is most likely that they are updated on different cycles. Therefore, they may be at different stages of update at a time. In order for the analysis to be based on consistent data, the data needs to be staged.

History

Operational databases may be missing some data that are required for decision support. They store current data only, but historical data may be critical for some decision making.

Summaries

Operational data is very detailed, but decision makers need to see it at a much higher level. For efficiency, it's usually necessary to store merged, adjusted information at summary level, which cannot be feasible in OLTP systems.

Data Updating

When the system allows users to modify the existing data, it is important to have a separate database which will not overwrite the original operational data.