Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 4. Scalars and Vectors

In linear algebra, vectors are a central concept. Other mathematical entities are usually defined by their relationship to vectors: scalars, for example, are single numbers that scale vectors when they are multiplied by them (stretching or contracting each coordinate). Vectors are usually written in boldface italic and lowercase.

4.1 Introduction

Vectors refer to various concepts according to the field they are used in. We will start by specifying what are scalars and vectors in machine learning and data science.

4.1.1 Vector Spaces

The term vector has more than one meaning, depending on the context in which it is used. We saw in “2.1 Coordinates And Vectors” that vectors can describe how to go from a point $A$ $A$ to another point $B$ $B$ . To characterize this displacement, we need a magnitude and a direction. In computer science, a vector is an ordered list of numbers.

In this book, when I talk about vectors, I mean free vectors represented in a Cartesian coordinate system, with the origin as the initial point. These vectors can be represented by ordered lists of numbers corresponding to the terminal point coordinates.

Axioms

The mathematical definition is a bit broader however. In pure mathematics, vector refers to objects that can be added and multiplied by a scalar. These two operations must satisfy some rules called axioms. Here are these axioms:

$𝐮 + 𝐯 = 𝐯 + 𝐮$ $? + ? = ? + ?$
$(𝐮 + 𝐯) + 𝐰 = 𝐮 + (𝐯 + 𝐰)$ $(? + ?) + ? = ? + (? + ?)$
$𝐮 + 0 = 𝐮$ $? + 0 = ?$
For all $𝐮$ $?$ , there is an element $- 𝐮$ $- ?$ so that: $𝐮 + (- 𝐮) = 0$ $? + (- ?) = 0$
$(c + d) 𝐮 = c 𝐮 + d 𝐮$ $(c + d) ? = c ? + d ?$
$c (𝐮 + 𝐯) = c 𝐮 + c 𝐯$ $c (? + ?) = c ? + c ?$
$c left-parenthesis d bold u right-parenthesis equals left-parenthesis c d right-parenthesis bold u$ $c left-parenthesis d bold u right-parenthesis equals left-parenthesis c d right-parenthesis bold u$
$1 bold u equals bold u$ $1 bold u equals bold u$

A set of vectors that satisfies these axioms is called a vector space.

Let’s see some vector types that we already know from the first part of the book, and look at how the two operations addition and scalar multiplication are executed.

Dimensions

You may encounter the notation ${I R}^{n}$ ${I R}^{n}$ . It expresses the real coordinate space: this is the $n$ $n$ -dimensional space where coordinate values are real numbers (real numbers include rational and irrational numbers).

Vectors in ${I R}^{2}$ ${I R}^{2}$ usually represented in the Cartesian plane have two components: x and y. Each vector is a point in the space. The vector space is constituted by all the vectors (i.e. all the points): the whole plane in ${I R}^{2}$ ${I R}^{2}$ .

Vectors in ${I R}^{3}$ ${I R}^{3}$ have three components: they can be represented as three numeric values like:

[\begin{matrix} 2.0 \\ 1.1 \\ - 2.5 \end{matrix}]

$[\begin{matrix} 2.0 \\ 1.1 \\ - 2.5 \end{matrix}]$

In the one-dimensional space, vectors in $normal upper I normal upper R Superscript 1$ $normal upper I normal upper R Superscript 1$ have only one components and thus are represented on a line (one axis).

4.1.2 Coordinate Vectors

As you have seen, what we call coordinate vectors are ordered lists of numbers corresponding to coordinates.

Indexing

Since we’ll use Numpy to get more insight about vectors, let’s recap few basic things that you can do to interact with vectors.

Indexing refers to the process of getting a vector element (one of the values from the vector) using its position in the vector (its index). In other words, the index of an element in a vector is its position in the vector.

For instance, let’s consider the following vector:

v = [\begin{matrix} 0.3 \\ 0.8 \\ 0.2 \\ 0.9 \end{matrix}]

$v = [\begin{matrix} 0.3 \\ 0.8 \\ 0.2 \\ 0.9 \end{matrix}]$

The index of the element 0.3 is 0, the index of 0.8 is 1 and so on. Let’s create this vector with Numpy and try some indexing.

v = np.array([0.3, 0.8, 0.2, 0.9])
v

array([0.3, 0.8, 0.2, 0.9])

Indexing is done using square brackets ([]). For instance, you can get the first element of v with:

v[0]

0.3

Or the element 1 with:

v[1]

0.8

It is also possible to get multiple elements. For instance, to get the elements from 1 (included) to 3 (excluded):

v[1:3]

array([0.8, 0.2])

4.2 Special Vectors

There are some special vectors that have interesting properties. It is important to know them to go further in linear algebra.

4.2.1 Unit Vectors

We call unit vector vectors that have a length of 1 (more details on vector length were given in “2.1 Coordinates And Vectors”).

4.2.2 Basis Vectors

We have seen that vectors can be represented as arrows going from the origin to a point that has coordinates corresponding to the numbers stored in an ordered list of coordinates, as shown in Figure 4-1.

The geometric representation shown in Figure 4-1 imply that we take a reference: the directions given by the two axes $x$ $x$ and $y$ $y$ . We call basis vectors the special vectors that have a length of 1 (unit vectors) and the directions used as reference. These vectors point in the direction of the axes and are unit vectors (they have a length of 1). For instance, in Figure 4-2, the basis vectors $i$ $i$ and $j$ $j$ point in the direction of the axis $x$ $x$ and $y$ $y$ respectively.

?? Wait linear combination for this ??

Any vector can be thought of as a scaled version of basis vectors. For instance, in Figure 4-3, we can consider the vector $v$ $v$ a stretched version of the basis vector $i$ $i$ : it is scaled by a factor of 2.

The basis of our vector space is very important because the values used to characterize the vectors are relative to this basis. By the way, you can choose different basis vectors (you can see an example in Figure 4-4). Keep in mind that vector coordinates depend on an implicit choice of basis vectors.

This shows that a space exists independently of the coordinate system we are using. The coordinate values corresponding to a vector in this space depends on the basis we choose.

4.2.3 Zero Vectors

A zero vector or null-vector is a vector of length 0. Adding the zero vector doesn’t change a vector.

4.2.4 Row and Columns Vectors

We can distinguish vectors according to their shape. In column vectors, the numbers are organized as a column:

[\begin{matrix} 2 \\ 1 \end{matrix}]

$[\begin{matrix} 2 \\ 1 \end{matrix}]$

In row vectors, they are organized as a row:

[\begin{matrix} 2 & 1 \end{matrix}]

$[\begin{matrix} 2 & 1 \end{matrix}]$

The convention is to use column vectors to list point coordinates.

Numpy is great to work with vectors. We’ll use this library to get practical insights about linear algebra. In Numpy, the function array() can be used to create vectors. Let’s try it:

v_row = np.array([1, 2, 3])
v_row

array([1, 2, 3])

We have created the Numpy equivalent of a vector: an array. You can use the property shape to check the shape of the array:

v_row.shape

(3,)

We can see that there is only one dimension (one number). This is because this is a vector. Numpy doesn’t distinguish between row and column vectors. To make this distinction, you would need to create a matrix with one column or one row (see in [Link to Come]).

4.2.5 Orthogonal Vectors

Two vectors are called orthogonal when they run in perpendicular directions. You can see a geometric example of two orthogonal vectors in Figure 4-5.

If the length of both orthogonal vectors is 1 (that is, if they are unit vectors), then they are called orthonormal.

4.3 Operations and Manipulations on Vectors

Some of the axioms characterizing vectors concern the operations of addition and multiplication by a scalar. We will see these important operations and how they can be used. We will also see a major manipulation: the transposition of vectors.

4.3.1 Scalar Multiplication

Scalar multiplication is the operation of multiplying a scalar with a vector. When multiplied by a scalar, a vector gives another vector (a scaled version of the initial vector). So, multiplying a vector by a scalar is like rescaling this vector. For instance, in Figure 4-6, the vector $𝐯$ $?$ is rescaled when multiplied by 1.3:

Let’s represent $v$ $v$ as a coordinate vector:

? = [\begin{matrix} 2 \\ 1 \end{matrix}]

$? = [\begin{matrix} 2 \\ 1 \end{matrix}]$

The scalar multiplication of $v$ $v$ will give us the following result:

1.3 ? = 1.3 [\begin{matrix} 2 \\ 1 \end{matrix}] = [\begin{matrix} 1.3 \cdot 2 \\ 1.3 \cdot 1 \end{matrix}] = [\begin{matrix} 2.6 \\ 1.3 \end{matrix}]

$1.3 ? = 1.3 [\begin{matrix} 2 \\ 1 \end{matrix}] = [\begin{matrix} 1.3 \cdot 2 \\ 1.3 \cdot 1 \end{matrix}] = [\begin{matrix} 2.6 \\ 1.3 \end{matrix}]$

Multiplying the vector $𝐯$ $?$ by a scalar corresponds to multiply each element of this vector by the scalar. With our example, both dimensions, x and y, are scaled by the scalar.

We can use Numpy to do scalar multiplication:

v = np.array([2, 1])
v

array([2, 1])

1.3 * v

array([2.6, 1.3])

Fast computations

With Numpy, vectorized operations are especially fast compared to, for instance, for loop.

4.3.2 Vector Addition

Adding two vectors gives another vectors.

Let’s take two vectors $bold v 1$ $bold v 1$ and $bold v 2$ $bold v 2$ .

Figure 4-7 shows their geometric representation.

Here is how these two vectors can be added together. As shown in Figure 4-8, adding them can be understood as changing the position from the origin to the terminal point of the first vector; from this terminal point, the change in position of the second vector is applied.

For each dimension, the coordinate of the new vector is the sum of the coordinates of the two vectors we added together. If we consider the coordinates of these geometric vectors, we can see how these operations can be done. In Figure 4-8, we had the vectors $v 1$ $v 1$ and $v 2$ $v 2$ that are defined by the following coordinates:

?_{1} = [\begin{matrix} 2 \\ 1 \end{matrix}]

$?_{1} = [\begin{matrix} 2 \\ 1 \end{matrix}]$

and

?_{2} = [\begin{matrix} 2 \\ 0 \end{matrix}]

$?_{2} = [\begin{matrix} 2 \\ 0 \end{matrix}]$

The first value corresponds to the x-coordinate and the second value to the y-coordinate. The vector resulting of the addition of $v 1$ $v 1$ and $v 2$ $v 2$ has an x-coordinate that corresponds to the sum of the x-coordinates of both vectors and the same for its y-coordinate:

?_{1} + ?_{2} = [\begin{matrix} 2 \\ 1 \end{matrix}] + [\begin{matrix} 2 \\ 0 \end{matrix}] = [\begin{matrix} 4 \\ 1 \end{matrix}]

$?_{1} + ?_{2} = [\begin{matrix} 2 \\ 1 \end{matrix}] + [\begin{matrix} 2 \\ 0 \end{matrix}] = [\begin{matrix} 4 \\ 1 \end{matrix}]$

Let’s do this vector addition with Numpy:

v1 = np.array([2, 1])
v1

array([2, 1])

v2 = np.array([2, 0])
v2

array([2, 0])

v1 + v2

array([4, 1])

Example

If your data is stored in vector form, you can efficiently create a new vector that corresponds to the sum of two other vectors. For instance, let’s say that you are working on a demographic dataset where one vector corresponds to the number of male children and another vector to the number of female children. You can calculate the total number of children for each data sample by adding the two vectors together.

4.3.3 Using Addition and Scalar Multiplication

You can think of each number of a coordinate vector as a scalar that scales the basis vectors (for instance, $i$ $i$ and $j$ $j$ in the two-dimensional Cartesian plane). For example, coordinates of the vector $v$ $v$ shown in Figure 4-1 correspond to the sum of the vector $i$ $i$ , scaled by the $x$ $x$ coordinate of $v$ $v$ , and the vector $j$ $j$ , scaled by the $y$ $y$ coordinate of $v$ $v$ . So we have:

2 i + 1 j = 2 [\begin{matrix} 1 \\ 0 \end{matrix}] + 1 [\begin{matrix} 0 \\ 1 \end{matrix}] = [\begin{matrix} 2 \\ 0 \end{matrix}] + [\begin{matrix} 0 \\ 1 \end{matrix}] = [\begin{matrix} 2 \\ 1 \end{matrix}]

$2 i + 1 j = 2 [\begin{matrix} 1 \\ 0 \end{matrix}] + 1 [\begin{matrix} 0 \\ 1 \end{matrix}] = [\begin{matrix} 2 \\ 0 \end{matrix}] + [\begin{matrix} 0 \\ 1 \end{matrix}] = [\begin{matrix} 2 \\ 1 \end{matrix}]$

4.3.4 Transposition

We have seen that vector values can be organized as rows or as columns. The transpose of a vector is an operator that transforms a row vector into a column vector or the opposite. It is denoted as the superscript letter $T$ $T$ (for instance, $𝐯^{T}$ $?^{T}$ ).

For instance, Figure 4-9 shows the transposition of a two-dimensional vector from column to row. So you can have:

{[\begin{matrix} x \\ y \end{matrix}]}^{T} = [\begin{matrix} x & y \end{matrix}]

${[\begin{matrix} x \\ y \end{matrix}]}^{T} = [\begin{matrix} x & y \end{matrix}]$

and

{[\begin{matrix} x & y \end{matrix}]}^{T} = [\begin{matrix} x \\ y \end{matrix}]

${[\begin{matrix} x & y \end{matrix}]}^{T} = [\begin{matrix} x \\ y \end{matrix}]$

With Numpy, the transpose of a vector is given by the simple letter T. However, since the distinction between row and column vector is not made, this will have no effect on one-dimensional arrays:

v = np.array([1, 2])
v

array([1, 2])

v.T

array([1, 2])

4.3.5 Operations on Other Vector Types - Functions

Other mathematical entities can be considered vectors if they satisfy the axioms listed in “4.1.1 Vector Spaces”.

Using this definition, functions can be considered vectors. Let’s try adding and multiplying functions.

Let’s take the following functions:

f (x) = x^{2}

$f (x) = x^{2}$

and

g (x) = 3 s i n (x)

$g (x) = 3 s i n (x)$

Now, plot these two functions:

# create x and y vectors
x = np.linspace(-2, 2, 100)
y = x ** 2

y1 = 3 * np.sin(x)

# choose figure size
plt.figure(figsize=(6, 6))

# Assure that ticks are displayed with a step equal to 1
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# draw axes
plt.axhline(0, c='#A9A9A9')
plt.axvline(0, c='#A9A9A9')

# assure x and y axis have the same scale
# plt.axis('equal')

plt.plot(x, y, label="$x^2$")
plt.plot(x, y1, label="$3 sin(x)$")
plt.legend()

Now, let’s plot the addition of these two functions:

# choose figure size
plt.figure(figsize=(6, 6))

# Assure that ticks are displayed with a step equal to 1
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# draw axes
plt.axhline(0, c='#A9A9A9')
plt.axvline(0, c='#A9A9A9')

plt.plot(x, y, label="$x^2$")
plt.plot(x, y1, label="$3 sin(x)$")
plt.plot(x, y + y1, label="$x^2 + 3sin(x)$")
plt.legend()

We can see in Figure 4-11 that adding $f left-parenthesis x right-parenthesis$ $f left-parenthesis x right-parenthesis$ and $g left-parenthesis x right-parenthesis$ $g left-parenthesis x right-parenthesis$ gives another function ( $x squared plus 3 s i n left-parenthesis x right-parenthesis$ $x squared plus 3 s i n left-parenthesis x right-parenthesis$ ).

We can do the same for scalar multiplication. For instance, let’s take $f left-parenthesis x right-parenthesis$ $f left-parenthesis x right-parenthesis$ and multiply it by -3:

# choose figure size
plt.figure(figsize=(6, 6))

# Assure that ticks are displayed with a step equal to 1
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# draw axes
plt.axhline(0, c='#A9A9A9')
plt.axvline(0, c='#A9A9A9')

plt.plot(x, y, label="$x^2$")
plt.plot(x, y * -3, label="$-3x^2$")
plt.legend()

It shows the plurality of what vectors can be. You can imagine various vector spaces: the vectors only have to satisfy the axioms. This is really just a mathematical convention.

Vectors are central to linear algebra. In this section of the book, I will represent vectors as coordinate vectors, that is, ordered lists of values. You can think of these as referring to arrow coordinates in a Cartesian plane.

What you will learn in this book about linear algebra can be applied to any vectors from this general definition. However, these concepts are usually explained through applying them to a single kind of vector. The definitions and proofs in math textbooks are more abstract because they have to encompass any type of vectors (thus, any type of objects that satisfy the axioms).

4.4 Norms

A norm is a function that takes a vector and returns a single number. You can think of the norm of a vector as its length.

The norm of a vector is denoted by adding to vertical bars each side:

∥v∥

$∥v∥$

It is different from the notation of absolute values (a single vertical bar each side). There is also a way to use norms to evaluate distance between two vectors: the difference between these two vectors gives a third vector. It is possible to calculate the length of this new vector. For this reason, norm is a major concept in machine learning and deep learning. For instance:

Cost function: The norm can be used to calculate the length of a vector storing the error between the estimated values made by a model and the true values.
Regularization: To prevent a model to from overfitting the training data, the length of the vector containing the model parameters can be added to the cost function. This helps the model to avoid large parameter values. More details on this will be given in “4.6 Hands-on Project: Regularization”.

4.4.1 Definitions

Interpreting the norm as the vector length is not necessarily straightforward. This depends on how you define a distance. There are multiple norms that are calculated differently. A mathematical entity can be called norm only if it respects the following rules:

Non-Negativity

Norms must be non-negative values. The interpretation as a length makes this rule understandable: a length can’t be negative.

Zero-Vector Norm

The norm of vector is 0 if and only if the vector is a zero-vector.

Scalar Multiplication

The norm of a vector multiplied by a scalar corresponds to the absolute value of this scalar multiplied by the norm of the vector.

For instance, if we take the scalar $k$ $k$ and the vector $u$ $u$ , we have:

∥k \cdot ?∥ = | k | \cdot ∥?∥

$∥k \cdot ?∥ = | k | \cdot ∥?∥$

Triangle Inequality

Norms respect the rule of triangle inequality, according to which the norm of the sum of two vectors is less than or equal to the norm of the first vector summed with the norm of the second vector.

We can write this mathematically as following:

∥? + ?∥ \leq ∥?∥ + ∥?∥

$∥? + ?∥ \leq ∥?∥ + ∥?∥$

As shown graphically in Figure 4-13, triangle inequality means simply that the shortest path between two points is a line.

With Numpy, the norm of a vector can be calculated with the function np.linalg.norm(). Let’s take the example corresponding to Figure 4-13:

v1 = np.array([2, 1])
v2 = np.array([1, 1])

np.linalg.norm(v1 + v2)

3.605551275463989

np.linalg.norm(v1) + np.linalg.norm(v2)

3.6502815398728847

The results show that the norm of $v Baseline 1$ $v Baseline 1$ summed with the norm of $v Baseline 2$ $v Baseline 2$ is larger than the norm of $v Baseline 1 plus v Baseline 2$ $v Baseline 1 plus v Baseline 2$ .

4.4.2 Examples of Norms

Every function that satisfies the rules from the last section can be called a norm. This implies that there is more than one kind of norm.

In machine learning and deep learning, it is important to be able to compare vectors. Norms provide a way to do that; different kind of norms can be used, and each has pros and cons.

Many norms fall into the category of the p-norm, which means the sum of the absolute value of each dimension raised to the power of a value $p$ $p$ . The result of this sum is raised to the power of $StartFraction 1 Over p EndFraction$ $StartFraction 1 Over p EndFraction$ . It may be easier to read the mathematical formula:

{∥?∥}_{p} = (\sum_{i}^{m} | ?_{i} {|^{p})}^{\frac{1}{p}}

${∥?∥}_{p} = (\sum_{i}^{m} | ?_{i} {|^{p})}^{\frac{1}{p}}$

with $m$ $m$ the number of elements in the vector, $x$ $x$ our vector, $i$ $i$ the current vector element.

Let’s split this equation to be sure that everything is clear:

$| x_{i} |$ $| x_{i} |$ corresponds to the absolute value of the $i$ $i$ th element of the vector.
$| x_{i} |^{p}$ $| x_{i} |^{p}$ : the absolute value is raised to the power of $p$ $p$ .
$sigma-summation Underscript i Overscript m Endscripts StartAbsoluteValue x Subscript i Baseline EndAbsoluteValue Superscript p$ $sigma-summation Underscript i Overscript m Endscripts StartAbsoluteValue x Subscript i Baseline EndAbsoluteValue Superscript p$ we calculate the sum of all powered absolute values from the vector $x$ $x$ from the first to the $m$ $m$ element (you can see more details on the sum notation in Sigma notation).
$left-parenthesis sigma-summation Underscript i Overscript m Endscripts StartAbsoluteValue x Subscript i Baseline EndAbsoluteValue Superscript p Baseline right-parenthesis Superscript StartFraction 1 Over p EndFraction$ $left-parenthesis sigma-summation Underscript i Overscript m Endscripts StartAbsoluteValue x Subscript i Baseline EndAbsoluteValue Superscript p Baseline right-parenthesis Superscript StartFraction 1 Over p EndFraction$ : we take the power $StartFraction 1 Over p EndFraction$ $StartFraction 1 Over p EndFraction$ of this result.

Different values of $p$ $p$ give different norms: $p equals 0$ $p equals 0$ is called the $upper L Superscript 0$ $upper L Superscript 0$ norm, $p equals 1$ $p equals 1$ is called the $upper L Superscript 1$ $upper L Superscript 1$ norm, $p equals 2$ $p equals 2$ is called the $L^{2}$ $L^{2}$ norm. These three norms are the more common ones.

$upper L Superscript 0$ $upper L Superscript 0$ “Norm”

We used quotation marks around the word norm because this is not a norm in the mathematical definition, as you will see. However, it can be used to compare vectors.

In the case of the $upper L Superscript 0$ $upper L Superscript 0$ ‘`norm’', we need to change the formula above because we can’t divide by 0. For this reason, we define it as:

{∥?∥}_{0} = \sum_{i}^{m} {| ?_{i} |}^{0}

${∥?∥}_{0} = \sum_{i}^{m} {| ?_{i} |}^{0}$

with $m$ $m$ the number of elements in the vector, $i$ $i$ the current element, and $x$ $x$ the vector.

In addition, since $0 Superscript 0$ $0 Superscript 0$ is not defined, we treat it as 0.

The consequence is that this norm will transform each element from a vector into 0 and 1: it will output 0 for input equals to 0 and 1 to input equals to any non-zero value. Since we take the sum of this, the $upper L Superscript 0$ $upper L Superscript 0$ norm corresponds to the number of the non-zero elements of the vector.

We can implement the $upper L Superscript 0$ $upper L Superscript 0$ norm as following:

v = np.array([1, 0, 0, -1.532, 230, 0.23, 1.7])
v

array([  1.   ,   0.   ,   0.   ,  -1.532, 230.   ,   0.23 ,   1.7  ])

np.sum(v != 0)

The $upper L Superscript 0$ $upper L Superscript 0$ “norm” of the vector $v$ $v$ is 5. We can also use the linalg module from Numpy: we used the function np.linalg.norm(). Without argument, it returns the $L^{2}$ $L^{2}$ norm. We can ask for the $upper L Superscript 0$ $upper L Superscript 0$ “norm” with:

np.linalg.norm(v, 0)

5.0

$upper L Superscript 1$ $upper L Superscript 1$ Norm

The $upper L Superscript 1$ $upper L Superscript 1$ norm is a function returning the sum of the absolute value of the coordinates:

\begin{matrix} {∥?∥}_{1} & = (\sum_{i}^{m} | ?_{i} {|^{1})}^{\frac{1}{1}} \\ = \sum_{i}^{m} | ?_{i} | \end{matrix}

$\begin{matrix} {∥?∥}_{1} & = (\sum_{i}^{m} | ?_{i} {|^{1})}^{\frac{1}{1}} \\ = \sum_{i}^{m} | ?_{i} | \end{matrix}$

with $m$ $m$ the number of elements in the vector, $i$ $i$ the current element, and $x$ $x$ the vector.

Taxicab distance

The $upper L Superscript 1$ $upper L Superscript 1$ norm is also called the Manhattan distance or the taxicab distance because of the displacement of a taxi in a street grid, like in Manhattan.

As illustrated in Figure 4-14, a taxi driver would prefer to take the yellow diagonal path if it would be possible. It is because the $L^{2}$ $L^{2}$ norm represents the physical world: this is different when we use the the $L^{2}$ $L^{2}$ norm.

REF: https://en.wikipedia.org/wiki/Taxicab_geometry

For instance, we can see in Figure 4-15 the $upper L Superscript 1$ $upper L Superscript 1$ norm of the vector $v$ $v$ .

The $upper L Superscript 1$ $upper L Superscript 1$ norm can be calculated with Numpy as follows:

v1 = np.array([2, 1])
v1

array([2, 1])

np.linalg.norm(v1, 1)

3.0

$L^{2}$ $L^{2}$ Norm

The $L^{2}$ $L^{2}$ norm is even more used in machine learning and deep learning than the $upper L Superscript 1$ $upper L Superscript 1$ norm. It is also called the Euclidean distance. The vector length measured with the $L^{2}$ $L^{2}$ norm corresponds to the physical distance in the real world, which is a consequence of the Pythagorean theorem (as we have seen in “2.2 Distance formula”). The formula is as follows:

\begin{matrix} {∥?∥}_{2} & = (\sum_{i}^{m} | ?_{i} {|^{2})}^{\frac{1}{2}} \\ = \sqrt{\sum_{i}^{m} {| ?_{i} |}^{2}} \\ = \sqrt{\sum_{i}^{m} ?_{i}^{2}} \end{matrix}

$\begin{matrix} {∥?∥}_{2} & = (\sum_{i}^{m} | ?_{i} {|^{2})}^{\frac{1}{2}} \\ = \sqrt{\sum_{i}^{m} {| ?_{i} |}^{2}} \\ = \sqrt{\sum_{i}^{m} ?_{i}^{2}} \end{matrix}$

with $m$ $m$ the number of elements in the vector, $i$ $i$ the current element, and $x$ $x$ the vector.

We don’t need to take the absolute value of coordinates since they are raised to the power of two.

For instance, we can see in Figure 4-16 the $L^{2}$ $L^{2}$ norm of the vector $v$ $v$ .

It is possible to use Numpy to calculate the $L^{2}$ $L^{2}$ norm:

np.linalg.norm(v1, 2)

2.23606797749979

Squared L2 Norm

For computation reasons, the squared $L^{2}$ $L^{2}$ norm can be preferred over the $L^{2}$ $L^{2}$ norm. Squaring the $L^{2}$ $L^{2}$ norm allows us to get rid of the square root in the formula.

\begin{matrix} {∥?∥}_{2}^{2} & = {(\sqrt{\sum_{i}^{m} ?_{i}^{2}})}^{2} \\ = \sum_{i}^{m} ?_{i}^{2} \end{matrix}

$\begin{matrix} {∥?∥}_{2}^{2} & = {(\sqrt{\sum_{i}^{m} ?_{i}^{2}})}^{2} \\ = \sum_{i}^{m} ?_{i}^{2} \end{matrix}$

with $m$ $m$ the number of elements in the vector, $i$ $i$ the current element, and $x$ $x$ the vector.

The resulting norm function is simpler: this is the sum of the squared vector elements.

Norm or error function?

We used the Mean Squared Error (MSE) in “2.6 Hands-On Project: MSE Cost Function With One Parameter”. It resembles the squared $L^{2}$ $L^{2}$ norm: the only difference is that we take the average of the squared errors with the MSE and not the sum with the squared $L^{2}$ $L^{2}$ norm. A norm is a function that associates a value to a vector. When this vector contains error values (for instance, when we calculate the norm of the vector given by $\hat{y} - y$ $\hat{y} - y$ ), it is called a cost function.

When the error vector is used as a cost function, its “length” is calculated. With neural networks, the cost is calculated at each epoch (at the end of the forward propagation) and its derivative is calculated to update the parameters (as we have seen in “3.1.5 Hands-On Project: Derivative Of The MSE Cost Function” and “3.4 Hands-On Project: MSE Cost Function With Two Parameters”). For this reason, it is good to have cost functions that are easily calculated and simple to differentiate from a computing perspective.

The squared $L^{2}$ $L^{2}$ norm allows us to remove the square root: in addition to simplifying its calculation and its derivative, it can be vectorized and, as we’ll see in “4.5.4 Hands-on Project: Vectorizing the Squared $L^{2}$ $L^{2}$ Norm with the Dot Product”. This is highly desirable since it permits faster computations and parallelizations.

Max Norm

The $L^{\infty}$ $L^{\infty}$ , or max norm (also called the Chebyshev norm) is a function returning the bigger element of the vector. This norm can be mathematically expressed as:

{∥?∥}_{\infty} = max_{i} | x_{i} |

${∥?∥}_{\infty} = max_{i} | x_{i} |$

It can be calculated with Numpy:

v = np.array([1, 0, 0, -1.532, 230, 0.23, 1.7])
v

array([  1.   ,   0.   ,   0.   ,  -1.532, 230.   ,   0.23 ,   1.7  ])

np.linalg.norm(v, np.inf)

230.0

The Frobenius Norm

The Frobenius norm is a $L^{2}$ $L^{2}$ norm applied to a matrix. The matrix is first flattened (convert to a one-dimensional vector) and then, the $L^{2}$ $L^{2}$ norm is calculated. The Frobenius norm of the matrix $A$ $A$ is denoted as $parallel-to upper A parallel-to Subscript upper F$ $parallel-to upper A parallel-to Subscript upper F$ . Each element of the matrix is raised to the power 2 and the square root of the sum is calculated:

{∥?∥}_{F} = \sqrt{\sum_{i, j} A_{i, j}^{2}}

${∥?∥}_{F} = \sqrt{\sum_{i, j} A_{i, j}^{2}}$

4.4.3 Norm Representations

We have seen that multiple types of norms can be used to measure the length of a vector. Norms have different characteristics that are visible when we plot them.

The Unit Circle

The unit circle is a shape where every point has a distance of 1 from the center. According to the kind of distance we use ( $upper L Superscript 1$ $upper L Superscript 1$ , $L^{2}$ $L^{2}$ , max…), this shape is different.

In Figure 4-17, some points with a distance of 1 from the origin have been represented according to the norm.

$upper L Superscript 1$ $upper L Superscript 1$ Norm

For instance, let’s calculate the $upper L Superscript 1$ $upper L Superscript 1$ norm of the point with coordinates (0.5, 0.5) from the left panel. We have:

\sum_{i}^{m} | ?_{i} | = | 0.5 | + | 0.5 | = 1

$\sum_{i}^{m} | ?_{i} | = | 0.5 | + | 0.5 | = 1$

If we take another point, for instance the point with coordinates (0, 1) we have also:

\sum_{i}^{m} | ?_{i} | = | 1 | + | 0 | = 1

$\sum_{i}^{m} | ?_{i} | = | 1 | + | 0 | = 1$

The small panel shows that if we take a lot of points with a distance of 1 from the origin, we have not a circle but a diamond shape.

$L^{2}$ $L^{2}$ Norm

The middle panel in Figure 4-17 shows the unit shape for the $L^{2}$ $L^{2}$ norm, which is a circle. This corresponds to what we expect in the physical world (where the word “distance” refers to the $L^{2}$ $L^{2}$ distance), where every point on a circle has the same distance from the center.

Max Norm

Finally, in the right panel, every point where the maximum of the two coordinate is 1 is on the unit shape. For example, the point of coordinate (0.5, 1) has a max norm of 1 and is thus on the unit shape.

Contour Plot

Another way to visualize these norm is to use a contour plot. The following code shows in Figure 4-18 the contour plot of both the $upper L Superscript 1$ $upper L Superscript 1$ and $L^{2}$ $L^{2}$ norms.

import seaborn as sns

x = np.linspace(-10, 10, 50)
y = np.linspace(-10, 10, 50)

X, Y = np.meshgrid(x, y)

def cost_function_l1(X, Y):
    Z = np.abs(X) + np.abs(Y)
    return Z

def cost_function_l2(X, Y):
    Z = np.sqrt(np.square(X) + np.square(Y))
    return Z


Z_1 = cost_function_l1(X, Y)
Z_2 = cost_function_l2(X, Y)

plt.figure(figsize=(8, 8))

# Assure that ticks are displayed with a step equal to 1
ax = plt.gca()
ax.xaxis.set_major_locator(ticker.MultipleLocator(1))
ax.yaxis.set_major_locator(ticker.MultipleLocator(1))

# draw axes
plt.axhline(0, c='#A9A9A9')
plt.axvline(0, c='#A9A9A9')

# assure x and y axis have the same scale
plt.axis('equal')

plt.subplot(221)
plt.title("L1 Norm")
plt.contour(X, Y, Z_1)

plt.subplot(222)
plt.title("L2 Norm")
plt.contour(X, Y, Z_2)

4.5 The Dot Product with vectors

Another important operation is the dot product (referring to the dot symbol used to characterize this operation), also called scalar product. Unlike addition and scalar multiplication, the dot product is an operation that takes two vectors and return a single number (a scalar, hence the name). If the two vectors are coordinate vectors in a Cartesian coordinate system, the dot product is also called the inner product.

4.5.1 Definition

The dot product between two vectors $v 1$ $v 1$ and $v 2$ $v 2$ , denoted by the symbol $\cdot$ $\cdot$ , is defined as the sum of the product of each pair of elements. More formally, it is expressed as:

v_{1} \cdot v_{2} = \sum_{i}^{m} v_{1}^{(i)} v_{2}^{(i)}

$v_{1} \cdot v_{2} = \sum_{i}^{m} v_{1}^{(i)} v_{2}^{(i)}$

with $m$ $m$ the number of elements in the vectors $v 1$ $v 1$ and $v 2$ $v 2$ (they have to contain the same number of elements) and $i$ $i$ the index of the current vector element.

Let’s take an example. We have the following vectors:

v_{1} = [\begin{matrix} 2 \\ 4 \\ 7 \end{matrix}]

$v_{1} = [\begin{matrix} 2 \\ 4 \\ 7 \end{matrix}]$

and

v_{2} = [\begin{matrix} 5 \\ 1 \\ 3 \end{matrix}]

$v_{2} = [\begin{matrix} 5 \\ 1 \\ 3 \end{matrix}]$

The dot product of these two vectors is defined as:

v_{1}^{T} \cdot v_{2} = [\begin{matrix} 2 & 4 & 7 \end{matrix}] [\begin{matrix} 5 \\ 1 \\ 3 \end{matrix}] = 2 \times 5 + 4 \times 1 + 7 \times 3 = 35

$v_{1}^{T} \cdot v_{2} = [\begin{matrix} 2 & 4 & 7 \end{matrix}] [\begin{matrix} 5 \\ 1 \\ 3 \end{matrix}] = 2 \times 5 + 4 \times 1 + 7 \times 3 = 35$

Note that we take the transpose of the first vector to ensure that the shapes match (however, using vectors and not matrices in Numpy, you won’t need to do that). The dot product between $v 1$ $v 1$ and $v 2$ $v 2$ is 35.

Let’s see how we do this Numpy:

v1 = np.array([2, 4, 7])
v1

array([2, 4, 7])

v2 = np.array([5, 1, 3])
v2

array([5, 1, 3])

Numpy arrays have a method dot() that do exactly what we want.

v1.dot(v2)

It is also possible to use the following equivalent syntax:

np.dot(v1, v2)

Or, with Python 3.5+, it is also possible to use the @ operator:

v1 @ v2

Matrix Multiplication

Note that the dot product is different from multiplication of the elements giving a vector with the same length. This operation is called the element-wise multiplication or the Hadamard product. The symbol $circled-dot$ $circled-dot$ is generally used to characterize this operation (you can sometimes read $\circ$ $\circ$ , but it is also used for function composition, which is a bit misleading).

Special Case

The dot product between orthogonal vectors is equal to 0. For instance, if we calculate the dot product of the basis vectors:

i = np.array([0, 1])
j = np.array([1, 0])
i @ j

4.5.2 Geometric interpretation

We could wonder how the dot product between two vectors is interpreted when apply to geometric vectors. We have seen the geometric interpretation of addition of vectors or multiplication by scalars, but what about the dot product?

Let’s take the two following vectors:

v_{1} = [\begin{matrix} 1 \\ 2 \end{matrix}]

$v_{1} = [\begin{matrix} 1 \\ 2 \end{matrix}]$

and

v_{2} = [\begin{matrix} 2 \\ 2 \end{matrix}]

$v_{2} = [\begin{matrix} 2 \\ 2 \end{matrix}]$

The dot product of these two vectors ( $v 1 dot v 2$ $v 1 dot v 2$ ) gives a scalar. Let’s start by calculating it:

v_{1}^{T} v_{2} = [\begin{matrix} 1 & 2 \end{matrix}] [\begin{matrix} 2 \\ 2 \end{matrix}] = 2 \cdot 1 + 2 \cdot 2 = 6

$v_{1}^{T} v_{2} = [\begin{matrix} 1 & 2 \end{matrix}] [\begin{matrix} 2 \\ 2 \end{matrix}] = 2 \cdot 1 + 2 \cdot 2 = 6$

What is the meaning of this value? Well, it is related to the idea of projecting $v 1$ $v 1$ onto $v 2$ $v 2$ . As shown in Figure 4-19, the projection of $v 1$ $v 1$ on the line with the direction of $v 2$ $v 2$ is like the shadow of the vector $v 1$ $v 1$ on this line.

The value of the dot product (6 in our example) corresponds to the multiplication of the length of $v 2$ $v 2$ (its norm: $StartAbsoluteValue v 2 EndAbsoluteValue$ $StartAbsoluteValue v 2 EndAbsoluteValue$ ) and the length of the projection of $v 1$ $v 1$ on $v 2$ $v 2$ ( $| | v_{proj} | |$ $| | v_{proj} | |$ ).

Let’s do the math. We have:

| v_{2} | = \sqrt{2^{2} + 2^{2}} = \sqrt{8}

$| v_{2} | = \sqrt{2^{2} + 2^{2}} = \sqrt{8}$

As we will see in [Link to Come], the projection of $v 1$ $v 1$ onto $v 2$ $v 2$ is:

v_{proj} = \frac{v_{1}^{T} v_{2}}{v_{2}^{T} v_{2}} v_{2} = \frac{6}{8} v_{2} = 0.75 v_{2}

$v_{proj} = \frac{v_{1}^{T} v_{2}}{v_{2}^{T} v_{2}} v_{2} = \frac{6}{8} v_{2} = 0.75 v_{2}$

So the length of $v_{proj}$ $v_{proj}$ is:

0.75 | v_{2} | = 0.75 \cdot \sqrt{8}

$0.75 | v_{2} | = 0.75 \cdot \sqrt{8}$

Finally, the multiplication of the length of $v 2$ $v 2$ and the length of the projection is:

0.75 \cdot \sqrt{8} \cdot \sqrt{8} = 0.75 \cdot 8 = 6

$0.75 \cdot \sqrt{8} \cdot \sqrt{8} = 0.75 \cdot 8 = 6$

This is the value we found with the dot product.

Special cases

The value that you obtain with the dot product tells you the relation between the two vectors. As shown in Figure 4-20, if this value is positive, the vectors are a going in the same way. If it is negative, the vectors are going in opposite ways. If it is zero, the vectors are orthogonal.

Why a Projection?

Multiplying vectors makes sense only if they point in the same direction. The dot product between two vectors is thus the multiplication of their length after a projection of one vector to the other.

4.5.3 Properties

The dot product has the following properties. Let’s take the following example vectors to illustrate them:

v1 = np.array([2, 4, 7])
v2 = np.array([1, 3, 2])
v3 = np.array([3, 5, 5])

Distributive

The dot product is distributive. This means that, for instance, with the three vectors $v 1$ $v 1$ , $v 2$ $v 2$ and $v 3$ $v 3$ , we have:

v_{1} (v_{2} + v_{3}) = v_{1} \cdot v_{2} + v_{1} \cdot v_{3}

$v_{1} (v_{2} + v_{3}) = v_{1} \cdot v_{2} + v_{1} \cdot v_{3}$

Let’s take an exemple:

v1 @ (v2 + v3)

(v1 @ v2) + (v1 @ v3)

Associative

The dot product is associative, but with vectors, since the dot product of two vectors gives a scalar, it is not possible to calculate the dot product between the result and a third vector:

v_{1} (v_{2} \cdot v_{3}) = (v_{1} \cdot v_{2}) v_{3}

$v_{1} (v_{2} \cdot v_{3}) = (v_{1} \cdot v_{2}) v_{3}$

We will see this in more details with matrices in [Link to Come].

Commutative

The dot product between vectors is said to be commutative. This means that the order of the vectors around the dot product doesn’t matter. We have:

v_{1} \cdot v_{2} = v_{2} \cdot v_{1}

$v_{1} \cdot v_{2} = v_{2} \cdot v_{1}$

v1 @ v2

v2 @ v1

However, be careful, because this is not necessarily true for matrices.

4.5.4 Hands-on Project: Vectorizing the Squared $L^{2}$ $L^{2}$ Norm with the Dot Product

In this Hands-on Project, we will see why the squared $L^{2}$ $L^{2}$ norm can be easily vectorized, improving computational efficiency.

Let’s take the vector $v$ $v$ :

v = [\begin{matrix} 2 \\ 1.3 \\ 4 \\ 7.2 \end{matrix}]

$v = [\begin{matrix} 2 \\ 1.3 \\ 4 \\ 7.2 \end{matrix}]$

Remember from the formula of the squared $L^{2}$ $L^{2}$ norm (“Squared L2 Norm”) that we have to calculate the sum of the squared vector elements:

{∥?∥}_{2}^{2} = \sum_{i}^{m} ?_{i}^{2}

${∥?∥}_{2}^{2} = \sum_{i}^{m} ?_{i}^{2}$

with $m$ $m$ the number of elements in the vector, $i$ $i$ the current element, and $x$ $x$ the vector.

Let’s calculate the squared $L^{2}$ $L^{2}$ norm of our vector $v$ $v$ . We have:

\begin{matrix} {∥?∥}_{2}^{2} & = \sum_{i}^{m} ?_{i}^{2} \\ = ?_{1}^{2} + ?_{2}^{2} + ?_{3}^{2} + ?_{4}^{2} \\ = 2^{2} + 1 . 3^{2} + 4^{2} + 7 . 2^{2} \\ = 73.53 \end{matrix}

$\begin{matrix} {∥?∥}_{2}^{2} & = \sum_{i}^{m} ?_{i}^{2} \\ = ?_{1}^{2} + ?_{2}^{2} + ?_{3}^{2} + ?_{4}^{2} \\ = 2^{2} + 1 . 3^{2} + 4^{2} + 7 . 2^{2} \\ = 73.53 \end{matrix}$

Now, let’s use the dot product to calculate the squared $L^{2}$ $L^{2}$ norm. We have seen that:

\begin{matrix} x \cdot x^{T} & = [\begin{matrix} x_{0} \\ x_{1} \\ \dots \\ x_{m} \end{matrix}] [\begin{matrix} x_{0} & x_{1} & \dots & x_{m} \end{matrix}] \\ = x_{0} x_{0} + x_{1} x_{1} + \dots + x_{m} x_{m} \\ = x_{0}^{2} + x_{1}^{2} + \dots + x_{m}^{2} \end{matrix}

$\begin{matrix} x \cdot x^{T} & = [\begin{matrix} x_{0} \\ x_{1} \\ \dots \\ x_{m} \end{matrix}] [\begin{matrix} x_{0} & x_{1} & \dots & x_{m} \end{matrix}] \\ = x_{0} x_{0} + x_{1} x_{1} + \dots + x_{m} x_{m} \\ = x_{0}^{2} + x_{1}^{2} + \dots + x_{m}^{2} \end{matrix}$

If we apply this to our vector $v$ $v$ we have:

\begin{matrix} v \cdot v^{T} & = [\begin{matrix} 2 \\ 1.3 \\ 4 \\ 7.2 \end{matrix}] [\begin{matrix} 2 & 1.3 & 4 & 7.2 \end{matrix}] \\ = 2 \cdot 2 + 1.3 \cdot 1.3 + 4 \cdot 4 + 7.2 \cdot 7.2 \\ = 2^{2} + 1 . 3^{2} + 4^{2} + 7 . 2^{2} \\ = 73.53 \end{matrix}

$\begin{matrix} v \cdot v^{T} & = [\begin{matrix} 2 \\ 1.3 \\ 4 \\ 7.2 \end{matrix}] [\begin{matrix} 2 & 1.3 & 4 & 7.2 \end{matrix}] \\ = 2 \cdot 2 + 1.3 \cdot 1.3 + 4 \cdot 4 + 7.2 \cdot 7.2 \\ = 2^{2} + 1 . 3^{2} + 4^{2} + 7 . 2^{2} \\ = 73.53 \end{matrix}$

We end up with the same result.

Let’s use the dot product from Numpy to calculate the squared $L^{2}$ $L^{2}$ norm of our vector $v$ $v$ :

v = np.array([2, 1.3, 4, 7.2])
v

array([2. , 1.3, 4. , 7.2])

np.linalg.norm(v, 2) ** 2

73.53000000000002

v @ v.T

73.53

The difference between the two approaches is that vectorization doesn’t use for loops. At the hardware level, vectorized code can be optimized and computations can be ran in parallel¹.

4.6 Hands-on Project: Regularization

The concept of norms is very useful in machine learning and deep learning. Applied to a vector of error, the norm becomes a cost function, allowing us to evaluate the performance of a model (for instance, the MSE cost function used in linear regression, see “2.6 Hands-On Project: MSE Cost Function With One Parameter”). Norms can also be used as a regularization method. Regularization is a way to prevent overfitting of a model by adding a constraint: we add a term in the cost function

¹ Here is more details about why vectorized code is faster: https://stackoverflow.com/questions/35091979/why-is-vectorization-faster-in-general-than-loops

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 4. Scalars and Vectors

Create new playlist

Sign In

Sign Up

Chapter 4. Scalars and Vectors

4.1 Introduction

4.1.1 Vector Spaces

Dimensions

4.1.2 Coordinate Vectors

4.2 Special Vectors

4.2.1 Unit Vectors

4.2.2 Basis Vectors

Figure 4-1. Relation between geometric representation of vectors and list of numbers.

Figure 4-2. The basis vectors in the Cartesian plane.

Figure 4-3. The vector vv<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="v"> <mi>v</mi> </math> can be considered a scaled version of the basis vector ii<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="i"> <mi>i</mi> </math>.

Figure 4-4. New basis vectors.

4.2.3 Zero Vectors

4.2.4 Row and Columns Vectors

4.2.5 Orthogonal Vectors

Figure 4-5. Two orthogonal vectors.

4.3 Operations and Manipulations on Vectors

4.3.1 Scalar Multiplication

Figure 4-6. Multiplication of the vector v?<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="bold v"> <mi>?</mi> </math> by a number (1.3).

Fast computations

4.3.2 Vector Addition

Figure 4-7. Geometrical representation of the two vectors v1?1<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="bold v 1"> <msub><mi>?</mi> <mn>1</mn> </msub> </math> and v2?2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="bold v 2"> <msub><mi>?</mi> <mn>2</mn> </msub> </math>.

Figure 4-8. Adding the vectors v1?1<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="bold v 1"> <msub><mi>?</mi> <mn>1</mn> </msub> </math> and v2?2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="bold v 2"> <msub><mi>?</mi> <mn>2</mn> </msub> </math> gives another vector.

4.3.3 Using Addition and Scalar Multiplication

4.3.4 Transposition

Figure 4-9. The transpose of a column vector is a row vector.

4.3.5 Operations on Other Vector Types - Functions

Figure 4-12. Multiplying the functions x2x2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="x squared"> <msup><mi>x</mi> <mn>2</mn> </msup> </math> by −3-3<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="negative 3"> <mrow> <mo>-</mo> <mn>3</mn> </mrow> </math>.

4.4 Norms

4.4.1 Definitions

Non-Negativity

Zero-Vector Norm

Scalar Multiplication

Triangle Inequality

Figure 4-13. Illustration of triangle inequality.

4.4.2 Examples of Norms

L0L0<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L Superscript 0"> <msup><mi>L</mi> <mn>0</mn> </msup> </math> “Norm”

L1L1<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L Superscript 1"> <msup><mi>L</mi> <mn>1</mn> </msup> </math> Norm

Taxicab distance

Figure 4-14. With the Manhattan distance, each path has the same length.

Figure 4-15. L1 norm of the vector vv<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="v"> <mi>v</mi> </math>.

L2L2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L squared"> <msup><mi>L</mi> <mn>2</mn> </msup> </math> Norm

Figure 4-16. L2 norm of the vector vv<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="v"> <mi>v</mi> </math>.

Squared L2 Norm

Norm or error function?

Max Norm

The Frobenius Norm

4.4.3 Norm Representations

The Unit Circle

L1L1<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L Superscript 1"> <msup><mi>L</mi> <mn>1</mn> </msup> </math> Norm

L2L2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L squared"> <msup><mi>L</mi> <mn>2</mn> </msup> </math> Norm

Max Norm

Contour Plot

Figure 4-18. Contour plot of the L1L1<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L Superscript 1"> <msup><mi>L</mi> <mn>1</mn> </msup> </math> and L2L2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L squared"> <msup><mi>L</mi> <mn>2</mn> </msup> </math> norms.

4.5 The Dot Product with vectors

4.5.1 Definition

Matrix Multiplication

Special Case

4.5.2 Geometric interpretation

Figure 4-19. The dot product can be seen as the length of v2v2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="v 2"> <msub><mi>v</mi> <mn>2</mn> </msub> </math> multiplied by the length of the projection.

Special cases

Figure 4-20. Link between the value of the dot product and the vector directions. This value is positive if the two vectors have the same direction, negative if they have an opposite direction, and zero if they are orthogonal.

Why a Projection?

4.5.3 Properties

Distributive

Associative

Commutative

4.5.4 Hands-on Project: Vectorizing the Squared L2L2<math xmlns="http://www.w3.org/1998/Math/MathML" alttext="upper L squared"> <msup><mi>L</mi> <mn>2</mn> </msup> </math> Norm with the Dot Product

4.6 Hands-on Project: Regularization

Table of Contents for
4. Scalars and Vectors

Figure 4-3. The vector $v$ $v$ can be considered a scaled version of the basis vector $i$ $i$ .

Figure 4-6. Multiplication of the vector $𝐯$ $?$ by a number (1.3).

Figure 4-7. Geometrical representation of the two vectors $bold v 1$ $bold v 1$ and $bold v 2$ $bold v 2$ .

Figure 4-8. Adding the vectors $bold v 1$ $bold v 1$ and $bold v 2$ $bold v 2$ gives another vector.

Figure 4-12. Multiplying the functions $x^{2}$ $x^{2}$ by $negative 3$ $negative 3$ .

$upper L Superscript 0$ $upper L Superscript 0$ “Norm”

$upper L Superscript 1$ $upper L Superscript 1$ Norm

Figure 4-15. L1 norm of the vector $v$ $v$ .

$L^{2}$ $L^{2}$ Norm

Figure 4-16. L2 norm of the vector $v$ $v$ .

$upper L Superscript 1$ $upper L Superscript 1$ Norm

$L^{2}$ $L^{2}$ Norm

Figure 4-18. Contour plot of the $upper L Superscript 1$ $upper L Superscript 1$ and $L^{2}$ $L^{2}$ norms.

Figure 4-19. The dot product can be seen as the length of $v 2$ $v 2$ multiplied by the length of the projection.

4.5.4 Hands-on Project: Vectorizing the Squared $L^{2}$ $L^{2}$ Norm with the Dot Product