10.4.6 Applications of Sketches
A large number of applications can utilize the sketch based strategies. One
practical issue is the heavy hitters [19, 49, 17]. For this problem, we need to
detect the most frequent items in the data stream. Recognizing the difference
among networks in the data stream was explored in [20], and detecting the
differences among data streams was studied in [25, 26]. Similar issues on
XML data (or tree data) were presented in [57, 58, 61]. These works aimed
to construct the synopsis for structured queries, which can be used later to
improve query processing performance.
Sketches based strategies are also well utilized in network research.
Improving the communication effi ciency for signals in sensor networks
is important. Moreover, considering resource limitations (e.g., battery),
effi cient storage by using concise summarization on the stream data is an
essential issue for sensor networks. A number of works have been conducted
to address the aforementioned issues, [15, 38, 46]. Please refer to [8] for
more details on these issues.
10.4.7 Advantages and Limitations of Sketch Strategies
Sketch-based strategies have several advantages. First is the space usage.
Sketch-based approaches have been theoretically and experimentally
proven to obtain an optimal sub-linear space usage in the data size. This
fi nding can be attributed to the fact that the space requirement is logarithmic
in the number of distinct items in the stream, which is relevant small by
considering the large volume of the data.
Despite the advantages of sketch based methods, several challenging
issues remain. First, almost all related studies use L
p
norm as the aggregate
measure, which may not refl ect the actual data distribution. Thus, the sketch
summarization may fail to store the essential information of the data.
Another issue is high dimensionality. The existence of hundreds of
independent dimensions in the data stream may hinder the practical
usage of the existing state-of-the-art sketch-based techniques. This issue
has been raised by several researchers [16]. However, the problem remains
challenging because of its intrinsic complexity.
As highlighted in [8], most sketch-based works only focus on identifying
the frequent instances and estimating the frequency moments and join size.
This emphasis on the micro view may neglect the macro trends in the stream
data, such as the temporal property. Thus, the temporal information may
be lost because of the transformation process when building the sketches.
Although several scholars have mentioned this issue and consequently
introduced effective techniques for temporal analysis [41], the strategy
requires signifi cant space usage, which makes the approach impractical
Data Stream 227