Part 2. Multidimensional queries

The common thread for the central part of this book is nearest neighbor search. It is first introduced as yet another special case in search, then used as a building block of more advanced algorithms.

This section opens with a description of the issues and challenges that are found when dealing with multi-dimensional data: indexing these data and performing spatial queries. We will once again show how ad hoc data structures can provide drastic improvements over using basic search algorithms.

Next, this section describes two advanced data structures that can be used to search multi-dimensional data.

In the second half of this part, we’ll check out applications of nearest neighbor search, starting with some practical examples, and then focusing on clustering, which heavily leverages spatial queries. Talking about clustering also allows us to introduce distributed computing, in particular the MapReduce programming model, which can be used to process volumes of data that are too large to be handled by any single machine.

There is an important difference in the structure of part 2 in comparison to the first seven chapters. As we’ll see, the discussion about these topics is particularly rich, and there is no way that we can cover them, or even just their crucial bits, in a single chapter. Therefore, while in part 1 each chapter followed a different pattern to explain topics, we’ll have to follow a single pattern throughout part 2, where each chapter will cover only one piece of our usual discussion.

Chapter 8 introduces the nearest neighbor problem, discusses a few naïve approaches to multi-dimensional queries, and introduces the problem used as an example for most of part 2.

Chapter 9 describes k-d trees, a solution for efficient search in multidimensional data sets, focusing on the 2D case (for the sake of visualization).

Chapter 10 presents more advanced versions of these trees, r-trees, which are briefly illustrated, and ss-trees, for which we’ll instead delve into specifications for each method. In the final sections of this chapter we also discuss the performance of ss-trees and how they can be improved further, and then compare them to k-d trees.

Chapter 11 focuses on the applications of nearest neighbor search, with a use case described in depth (finding the closest warehouse from which goods should be shipped to customers), but also introducing several problems that can benefit from the application of k-d trees or ss-trees.

Chapter 12 focuses on an interesting use case that leverages the efficient nearest neighbor search algorithms presented so far. It enters the machine-learning world and describes three clustering algorithms, k-means, DBSCAN, and OPTICS.

Chapter 13 concludes this part by introducing MapReduce, a powerful computational model for distributed computing, and applies it to the three clustering algorithms: k-mean and DBSCAN, discussed in chapter 12, and canopy clustering, introduced in this chapter.

  

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.162.216