Publisher Summary

This chapter focuses on data cube technology. Data warehouse systems provide online analytical processing (OLAP) tools for interactive analysis of multidimensional data at varied granularity levels. OLAP tools typically use the data cube and a multidimensional data model to provide flexible access to summarized data. A data cube can interactively explore the data in a multidimensional way through OLAP operations like drill-down (to see more specialized data such as total sales per city) or roll-up (to see the data at a more generalized level such as total sales per country). Although the data cube concept was originally intended for OLAP, it is also useful for data mining. Multidimensional data mining is an approach to data mining that integrates OLAP-based data analysis with knowledge discovery techniques. It is also known as exploratory multidimensional data mining and online analytical mining (OLAM). It searches for interesting patterns by exploring the data in multidimensional space. Users can interactively drill down or roll up to varying abstraction levels to find classification models, clusters, predictive rules, and outliers. Methods for data cube computation and methods for multidimensional data analysis are focused on. Precomputing a data cube (or parts of a data cube) allows for fast accessing of summarized data. Given the high dimensionality of most data, multidimensional analysis can run into performance bottlenecks. Therefore, it is important to study data cube computation techniques. Data cube technology provides many effective and scalable methods for cube computation. Studying these methods also help in the understanding and further development of scalable methods for other data mining tasks such as the discovery of frequent patterns.

Data warehouse systems provide online analytical processing (OLAP) tools for interactive analysis of multidimensional data at varied granularity levels. OLAP tools typically use the data cube and a multidimensional data model to provide flexible access to summarized data. For example, a data cube can store precomputed measures (like count() and total_sales()) for multiple combinations of data dimensions (like item, region, and customer). Users can pose OLAP queries on the data. They can also interactively explore the data in a multidimensional way through OLAP operations like drill-down (to see more specialized data such as total sales per city) or roll-up (to see the data at a more generalized level such as total sales per country).

Although the data cube concept was originally intended for OLAP, it is also useful for data mining. Multidimensional data mining is an approach to data mining that integrates OLAP-based data analysis with knowledge discovery techniques. It is also known as exploratory multidimensional data mining and online analytical mining (OLAM). It searches for interesting patterns by exploring the data in multidimensional space. This gives users the freedom to dynamically focus on any subset of interesting dimensions. Users can interactively drill down or roll up to varying abstraction levels to find classification models, clusters, predictive rules, and outliers.

This chapter focuses on data cube technology. In particular, we study methods for data cube computation and methods for multidimensional data analysis. Precomputing a data cube (or parts of a data cube) allows for fast accessing of summarized data. Given the high dimensionality of most data, multidimensional analysis can run into performance bottlenecks. Therefore, it is important to study data cube computation techniques. Luckily, data cube technology provides many effective and scalable methods for cube computation. Studying these methods will also help in our understanding and further development of scalable methods for other data mining tasks such as the discovery of frequent patterns (Chapters 6 and 7).

We begin in Section 5.1 with preliminary concepts for cube computation. These summarize the data cube notion as a lattice of cuboids, and describe basic forms of cube materialization. General strategies for cube computation are given. Section 5.2 follows with an in-depth look at specific methods for data cube computation. We study both full materialization (i.e., where all the cuboids representing a data cube are precomputed and thereby ready for use) and partial cuboid materialization (where, say, only the more “useful” parts of the data cube are precomputed). The multiway array aggregation method is detailed for full cube computation. Methods for partial cube computation, including BUC, Star-Cubing, and the use of cube shell fragments, are discussed.

In Section 5.3, we study cube-based query processing. The techniques described build on the standard methods of cube computation presented in Section 5.2. You will learn about sampling cubes for OLAP query answering on sampling data (e.g., survey data, which represent a sample or subset of a target data population of interest). In addition, you will learn how to compute ranking cubes for efficient top-k (ranking) query processing in large relational data sets.

In Section 5.4, we describe various ways to perform multidimensional data analysis using data cubes. Prediction cubes are introduced, which facilitate predictive modeling in multidimensional space. We discuss multifeature cubes, which compute complex queries involving multiple dependent aggregates at multiple granularities. You will also learn about the exception-based discovery-driven exploration of cube space, where visual cues are displayed to indicate discovered data exceptions at all aggregation levels, thereby guiding the user in the data analysis process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.16.203.107