2. Some Technical Background

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2. Some Technical Background

2.1 About Our Setup

We’re about to get down—funk style—with some coding. The chapters in this book started life as Jupyter notebooks. If you’re unfamiliar with Jupyter notebooks, they are a very cool environment for working with Python code, text, and graphics in one browser tab. Many Python-related blogs are built with Jupyter notebooks. At the beginning of each chapter, I’ll execute some lines of code to set up the coding environment.

The content of mlwpy.py is shown in Appendix A. While from module import * is generally not recommended, in this case I’m using it specifically to get all of the definitions in mlwpy.py included in our notebook environment without taking up forty lines of code. Since scikit-learn is highly modularized—which results in many, many import lines—the import * is a nice way around a long setup block in every chapter. %matplotlib inline tells the notebook system to display the graphics made by Python inline with the text.

In [1]:

from mlwpy import *
%matplotlib inline

2.2 The Need for Mathematical Language

It is very difficult to talk about machine learning (ML) without discussing some mathematics. Many ML textbooks take that to an extreme: they are math textbooks that happen to discuss machine learning. I’m going to flip that script on its head. I want you to understand the math we use and to have some intuition from daily life about what the math-symbol-speak means when you see it. I’m going to minimize the amount of math that I throw around. I also want us—that’s you and me together on this wonderful ride—to see the math as code before, or very shortly after, seeing it as mathematical symbols.

Maybe, just maybe, after doing all that you might decide you want to dig into the mathematics more deeply. Great! There are endless options to do so. But that’s not our game. We care more about the ideas of machine learning than using high-end math to express them. Thankfully, we only need a few ideas from the mathematical world:

Simplifying equations (algebra),
A few concepts related to randomness and chance (probability),
Graphing data on a grid (geometry), and
A compact notation to express some arithmetic (symbols).

Throughout our discussion, we’ll use some algebra to write down ideas precisely and without unnecessary verbalisms. The ideas of probability underlie many machine learning methods. Sometimes this is very direct, as in Naive Bayes (NB); sometimes it is less direct, as in Support Vector Machines (SVMs) and Decision Trees (DTs). Some methods rely very directly on a geometric description of data: SVMs and DTs shine here. Other methods, such as NB, require a bit of squinting to see how they can be viewed through a geometric lens. Our bits of notation are pretty low-key, but they amount to a specialized vocabulary that allows us to pack ideas into boxes that, in turn, fit into larger packages. If this sounds to you like refactoring a computer program from a single monolithic script into modular functions, give yourself a prize. That’s exactly what is happening.

Make no mistake: a deep dive into the arcane mysteries of machine learning requires more, and deeper, mathematics than we will discuss. However, the ideas we will discuss are the first steps and the conceptual foundation of a more complicated presentation. Before taking those first steps, let’s introduce the major Python packages we’ll use to make these abstract mathematical ideas concrete.

2.3 Our Software for Tackling Machine Learning

The one tool I expect you to have in your toolbox is a basic understanding of good, old-fashioned procedural programming in Python. I’ll do my best to discuss any topics that are more intermediate or advanced. We’ll be using a few modules from the Python standard library that you may not have seen: itertools, collections, and functools.

We’ll also be making use of several members of the Python number-crunching and data science stack: numpy, pandas, matplotlib, and seaborn. I won’t have time to teach you all the details about these tools. However, we won’t be using their more complicated features, so nothing should be too mind-blowing. We’ll also briefly touch on one or two other packages, but they are relatively minor players.

Of course, much of the reason to use the number-crunching tools is because they form the foundation of, or work well with, scikit-learn. sklearn is a great environment for playing with the ideas of machine learning. It implements many different learning algorithms and evaluation strategies and gives you a uniform interface to run them. Win, win, and win. If you’ve never had the struggle—pleasure?—of integrating several different command-line learning programs . . . you didn’t miss anything. Enjoy your world, it’s a better place. A side note: scikit-learn is the project’s name; sklearn is the name of the Python package. People use them interchangeably in conversation. I usually write sklearn because it is shorter.

2.4 Probability

Most of us are practically exposed to probability in our youth: rolling dice, flipping coins, and playing cards all give concrete examples of random events. When I roll a standard six-sided die—you role-playing gamers know about all the other-sided dice that are out there—there are six different outcomes that can happen. Each of those events has an equal chance of occurring. We say that the probability of each event is $\frac{1}{6}$ $\frac{1}{6}$ . Mathematically, if I—a Roman numeral one, not me, myself, and I—is the case where we roll a one, we’ll write that as $P (I) = \frac{1}{6}$ $P (I) = \frac{1}{6}$ . We read this as “the probability of rolling a one is one-sixth.”

We can roll dice in Python in a few different ways. Using NumPy, we can generate evenly weighted random events with np.random.randint. randint is designed to mimic Python’s indexing semantics, which means that we include the starting point and we exclude the ending point. The practical upshot is that if we want values from 1 to 6, we need to start at 1 and end at 7: the 7 will not be included. If you are more mathematically inclined, you can remember this as a half-open interval.

In [2]:

np.random.randint(1, 7)

Out[2]:

If we want to convince ourselves that the numbers are really being generated with equal likelihoods (as with a perfect, fair die), we can draw a chart of the frequency of the outcomes of many rolls. We’ll do that in three steps. We’ll roll a die, either a few times or many times:

In [3]:

Product	Quantity	Cost Per
Wine	2	12.50
Orange	12	.50
Muffin	3	1.75

	errors	squared
0	5.0000	25.0000
1	-5.0000	25.0000
2	3.2000	10.2400
3	-1.1000	1.2100

Name	Example	Concrete	Abstract	Mathese
Constant	total = parking	total = $40	y = 40	y = c
Line	total =	total =	y = 80x + 40	y = mx + b
	ticket × person + parking	80 × person + 40

	0	1
0	-3.0000	1.0000
1	-2.9394	1.0000
2	-2.8788	1.0000
3	-2.8182	1.0000
4	-2.7576	1.0000

Table of Contents for 2. Some Technical Background

Create new playlist

Sign In

Sign Up

2. Some Technical Background

2.1 About Our Setup

2.2 The Need for Mathematical Language

2.3 Our Software for Tackling Machine Learning

2.4 Probability

2.4.1 Primitive Events

2.4.2 Independence

2.4.3 Conditional Probability

2.4.4 Distributions

2.5 Linear Combinations, Weighted Sums, and Dot Products

2.5.1 Weighted Average

2.5.2 Sums of Squares

2.5.3 Sum of Squared Errors

2.6 A Geometric View: Points in Space

2.6.1 Lines

2.6.2 Beyond Lines

2.7 Notation and the Plus-One Trick

2.8 Getting Groovy, Breaking the Straight-Jacket, and Nonlinearity

2.9 NumPy versus “All the Maths”

2.9.1 Back to 1D versus 2D

2.10 Floating-Point Issues

2.11 EOC

2.11.1 Summary

2.11.2 Notes

Table of Contents for
2. Some Technical Background