DW Overview: Data Warehouse
Before we jump into the OLAP specific tutorial, we should get familiar with the notion of a data warehouse first. A data warehouse (DW) is an approach for creating an enterprise-wide data store. It is an integral part of many information delivery systems because it provides a single, central location where a reconciled version of data extracted from a wide variety of operational systems such as OLTPs is stored. It allows users to access critical information in a timely manner so that they can make decisions that could improve their organization's performance. However, just having a data warehouse does not provide an organization with the business benefits of data warehouses. An organization needs to deliver systems that allow executives and/or analysts to make strategic and tactical decisions based on the information stored in the data warehouses. These decision support systems are referred to as On-Line Analytical Processing (OLAP) systems. We will learn more about OLAP systems in the next sections.
As briefly mentioned above, data warehouses contain consolidated data, from several operational databases and other data sources, over long periods of time. They are projected to be hundreds of gigabytes to terabytes in size. For such a large size data warehouse, query throughput and response times are very important. To facilitate these complex analyses, data in a data warehouse is typically modeled multidimensionally.
Before we jump into the next section to learn more about data warehouse architecture, the list below gives an overview of the activities that are included in the data warehouse design and rollout.
- Define the architecture.
- Perform capacity planning.
- Select the storage servers, database and OLAP servers, and tools.
- Integrate the servers, storage, and client tools.
- Design the warehouse schema and views.
- Define the physical data warehouse organization, data placement, partitioning, and access methods.
- Connect the sources using gateways, ODBC drivers, or other wrappers.
- Design and implement scripts for data extraction, cleaning, transformation, load, and refresh.
- Populate the repository with the schema and view definitions, scripts, and other metadata.
- Design and implement user applications.
- Roll out the data warehouse and applications.
|