Linear algebra 101

I want to take a detour to talk about linear algebra. It's featured quite a bit so far in this book, although it was scarcely mentioned by name. In fact linear algebra underlies every chapter we've done so far.

Imagine you have two equations:

Let's say and is and , respectively. We can now write the following equations as such:

And we can solve it using basic algebra (please do work it out on your own): and .

What if you have three, four, or five simultaneous equations? It starts to get cumbersome to calculate these values. Instead, we invented a new notation: the matrix notation, which will allow us to solve simultaneous equations faster.

It had been used for about 100 years without a name (it was first termed "matrix" by James Sylvester) and formal rules were being used until Arthur Cayley formalized the rules in 1858. Nonetheless, the idea of grouping together parts of an equation into a bunch had been long used.

We start by "factoring" out the equations into their parts:

The horizontal line indicates that it's two different equations, not that they are ratios. Of course, we realize that we've been making too many repetitions so we simplify the matrix of and

Here, you can see that and is only ever written once. It's rather unneat to write it the way we just wrote it, so instead we write it like so to be neater:

Not only do we write it like so, we give specific rule on how to read this notation:

We should give the matrices names so we can refer to them later on:

The bold indicates that the variable holds multiple values. An uppercase indicates a matrix (

), and lowercase indicates a vector (

and

. This is to distinguish it from scalar variables (variables that only hold one value), which are typically written without boldface (for example,

and

To solve the equations, the solution is simply this:

The superscript indicates an inverse is to be taken. This is rather consistent with normal algebra.

Consider a problem where you are asked to solve for . The solution is simply . Or we can rewrite it as a series of multiplications as. And what do we know about fractions where one is the numerator? They can simply be written as a power to the -1. Hence, we arrive at this solution equation:

Now if you squint very carefully, the scalar version of the equation looks very much like the matrix notation version of the equation.

How to calculate the inverse of a matrix is not what this book aims to do. Instead, I encourage you to pick up a linear algebra text book. I highly recommend Sheldon Axler's Linear Algebra Done Right (Springer Books).

To recap, here are the main points:

Matrix multiplication and notation were invented to solve simultaneous equations.
To solve the simultaneous equation, we treat the equation as though the variables were scalar variables and use inverses.

Now comes the interesting part. Using the same two equations, we will turn the question around. What if we knew what and is instead? The equations would now look something like this:

Writing it in matrix form, we get the following:

Careful readers would have caught an error by now: there are four variables (, , , and ), but only two equations. From high-school math, we learn that you can't solve a system of equations where there are fewer equations than there are variables!

The thing is, your high school math teacher kind of lied to you. It is sort of possible to solve this, and you've already done so yourself in Chapter 2, Linear Regression - House Price Prediction.

In fact, most machine learning problems can be re-expressed in linear algebra, specifically of this form:

And this in my opinion, is the right way to think about artificial neural networks: a series of mathematical functions, not an analogue of biological neurons. We will explore this a bit more in the next chapter. In fact, this understanding is vital to the understanding of deep learning and why it works.

For now, it suffices to follow on with the more common notion that an artificial neural network is similar in actions to a biologically inspired neural network.

Table of Contents for Linear algebra 101

Create new playlist

Sign In

Sign Up

Table of Contents for
Linear algebra 101