5.5 Summary

■ Data cube computation and exploration play an essential role in data warehousing and are important for flexible data mining in multidimensional space.

■ A data cube consists of a lattice of cuboids. Each cuboid corresponds to a different degree of summarization of the given multidimensional data. Full materialization refers to the computation of all the cuboids in a data cube lattice. Partial materialization refers to the selective computation of a subset of the cuboid cells in the lattice. Iceberg cubes and shell fragments are examples of partial materialization. An iceberg cube is a data cube that stores only those cube cells that have an aggregate value (e.g., count) above some minimum support threshold. For shell fragments of a data cube, only some cuboids involving a small number of dimensions are computed, and queries on additional combinations of the dimensions can be computed on-the-fly.

■ There are several efficient data cube computation methods. In this chapter, we discussed four cube computation methods in detail: (1) MultiWay array aggregation for materializing full data cubes in sparse-array-based, bottom-up, shared computation; (2) BUC for computing iceberg cubes by exploring ordering and sorting for efficient top-down computation; (3) Star-Cubing for computing iceberg cubes by integrating top-down and bottom-up computation using a star-tree structure; and (4) shell-fragment cubing, which supports high-dimensional OLAP by precomputing only the partitioned cube shell fragments.

■ Multidimensional data mining in cube space is the integration of knowledge discovery with multidimensional data cubes. It facilitates systematic and focused knowledge discovery in large structured and semi-structured data sets. It will continue to endow analysts with tremendous flexibility and power at multidimensional and multigranularity exploratory analysis. This is a vast open area for researchers to build powerful and sophisticated data mining mechanisms.

■ Techniques for processing advanced queries have been proposed that take advantage of cube technology. These include sampling cubes for multidimensional analysis on sampling data, and ranking cubes for efficient top-k (ranking) query processing in large relational data sets.

■ This chapter highlighted three approaches to multidimensional data analysis with data cubes. Prediction cubes compute prediction models in multidimensional cube space. They help users identify interesting data subsets at varying degrees of granularity for effective prediction. Multifeature cubes compute complex queries involving multiple dependent aggregates at multiple granularities. Exception-based, discovery-driven exploration of cube space displays visual cues to indicate discovered data exceptions at all aggregation levels, thereby guiding the user in the data analysis process.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.147.104.188