As most of this chapter's content will be dealing with trying to predict or optimize continuous variables, let's first understand how to measure the difference in a continuous space. Unless a drastically new discovery is made pretty soon, the space we live in is a three-dimensional Euclidian space. Whether we like it or not, this is the world we are mostly comfortable with today. We can completely specify our location with three continuous numbers. The difference in locations is usually measured by distance, or a metric, which is a function of a two arguments that returns a single positive real number. Naturally, the distance, , between X and Y should always be equal or smaller than the sum of distances between X and Z and Y and Z:
For any X, Y, and Z, which is also called triangle inequality. The two other properties of a metric is symmetry:
Non-negativity of distance:
Here, the metric is 0
if, and only if, X=Y. The distance is the distance as we understand it in everyday life, the square root of the sum of the squared differences along each of the dimensions. A generalization of our physical distance is p-norm (p = 2 for the distance):
Here, the sum is the overall components of the X and Y vectors. If p=1, the 1-norm is the sum of absolute differences, or Manhattan distance, as if the only path from point X to point Y would be to move only along one of the components. This distance is also often referred to as distance:
Here is a representation of a circle in a two-dimensional space:
Another frequently used special case is , the limit when , which is the maximum deviation along any of the components, as follows:
The equidistant circle for the distance is shown in Figure 05-3:
I'll consider the Kullback-Leibler (KL) distance later when I talk about classification, which measures the difference between two probability distributions, but it is an example of distance that is not symmetric and thus it is not a metric.
The metric properties make it easier to decompose the problem. Due to the triangle inequality, one can potentially reduce a difficult problem of optimizing a goal by substituting it by a set of problems by optimizing along a number of dimensional components of the problem separately.
18.117.188.67