Chapter 1. Basic Math and Calculus Review

Before we dive into the applied areas of essential math such as probability, linear algebra, statistics, and machine learning, we should probably review a few basic math and calculus concepts. Before you drop this book and run screaming, do not worry! I will present how to calculate slopes and areas for a function in a way you were probably not taught in college. We got Python on our side, not a pencil and paper. We can create some fairly simple functions to do Calculus from scratch to calculate derivatives and integrals, which calculate slopes and areas for functions respectively.

We will also review what numbers are and how variables and functions work on a Cartesian system. We will then cover exponents and logarithms. After that we will learn the two basic operations of calculus: derivatives and integrals.

I will make these topics as tight and practical as possible, focusing only on what will help us in later chapters and fall under the “essential math” umbrella. This is by no means a comprehensive review of high school and college math. If you want that, a great book to check out is No Bullshit Guide to Math and Physics by Ivan Savov. The first few chapters contain the best crash course on high school and college math I have ever seen. The book Mathematics 1001 by Dr. Richard Elwes has some great content as well, and in bite-sized explanations.

Number Theory

What are numbers? I promise to not be too philosophical in this book, but are numbers not a construct we have defined? Why do we have the digits 0 through 9, and not have more digits than that? Why do we have fractions and decimals and not just whole numbers? This area of math where we muse about numbers and why we designed them a certain way is known as number theory.

Number theory goes all the way back to ancient times, where mathematicians study different number systems and why we have accepted them the way we do today. Here are different number systems that you may recognize:

Natural Numbers

These are the numbers 0, 1, 2, 3, 4, 5… and so on. Only 0 and positive whole numbers are included here, and are the earliest known system. Natural numbers are so ancient cavemen scratched tally marks on bones and cave walls to keep records. The concept of “0” was later accepted, and the Babylonians developed the useful idea for place-holding notation for empty “columns” on numbers greater than 9, such as “10”, “1000”, or “1090.” Those zeros indicate no value occupying that column.


Integers include positive and negative whole numbers as well as 0. We may take them for granted, but ancient mathematicians were deeply distrusting of the idea of negative numbers. But when you subtract 5 from 3, you get -2. This is useful especially when it comes to finances where we measure profits and losses. In 628 AD, an Indian mathematician named Brahmagupta showed why negative numbers were necessary for arithmetic to progress, and therefore integers became accepted.

Rational Numbers

Any number that you can express as a fraction, such as 23, is a rational number. This includes all finite decimals and integers since they can be expressed as fractions too, such as 687100=6.87 and 21=2 respectively. They are called rational because they are ratios. Rational numbers were quickly deemed necessary bececause because time, resources, and other quantities could not always be measured in discrete units. Milk does not always come in gallons. We may have to measure it as parts of a gallon. If I run for 12 minutes, I cannot be forced to measure in whole miles when in actuality I ran 910 of a mile.

Irrational Numbers

Irrational numbers cannot be expressed as a fraction. This includes the famous Pi π, square roots of certain numbers like 2, and Euler’s number e which we will learn about later. These numbers have an infinite number of decimal digits, such as π=3.141592653589793238462...

There is an interesting history behind irrational numbers. The Greek mathematician Pythagoras believed all numbers are rational. He believed this so fervently, he made a religion that prayed to the number 10. "Bless us, divine number, thou who generated gods and men!" he and his followers would pray (why “10” was so special, I do not know). There is a legend that one of his followers Hippasusus proved not all numbers are rational simply by demonstrating the square root of 2. This severely messed with Pythagoras’ belief system, and he responded by drowning Hippasus out at sea.

Regardless, we now know not all numbers are rational.

Real Numbers

Real numbers include rational as well as irrational numbers. In practicality, when you are doing any data science work you can treat any decimals you work with as real numbers.

Complex and Imaginary Numbers

You encounter this number type when you take the square root of a negative number. While imaginary and complex numbers have relevance in certain types of problems, we will mostly steer clear from them for our purposes with rare exceptions.

In data science, you will find most (if not all) your work will be using natural numbers, integers, and real numbers. Imaginary numbers may be encountered in more advanced use cases such as matrix decomposition, which we will touch on in Chapter 4.

Further Resources

If you do want to learn about imaginary numbers, there is a great playlist Imaginary Numbers are Real on YouTube here:

Order of Operations

Hopefully you are familiar with order of operations which is the order you solve each part of a mathematical expression. As a brief refresher, recall you evaluate components in parantheses, followed by exponents, then multiplication, division, addition, and subtraction. You can remember the order of operations by the mnemonic device PEMDAS (Please Excuse My Dear Aunt Sally) which corresponds to the ordering paranthesis, exponents, multiplication, division, addition, and subtraction.

Take for example this expression:


First we evaluate the paranetheses (3+2) which evaluates to 5.


Next we solve the exponent, which we can see is squaring that 5 we just summed. That is 25.


Next up we have multiplication and division. The ordering of these two is swappable since division is also multiplication (using fractions). Let’s go ahead and multiply the 2 with the 255, yielding 505


Next we will perform the division, dividing 50 by 5 which will yield 10.


And finally we perform any addition and subtraction. Of course, 10-4 is going to give us 6.


Sure enough, if we were to express this in Python we would print a value of 6.0 as shown in Example 1-1.

Example 1-1. Solving an expression in Python
my_value = 2 * (3 + 2)**2 / 5 - 4

print(my_value) # prints 6.0

This may be elementary to you, but it is still criticial nonetheless. In code, even if you get the correct result without them, it is a good practice to liberally use paranthesis in complex expressions so you establish control of the evaluation order. Here I group up the fractional part of my expression in paranthesis, helping to set it apart from the rest of the expression in Example 1-2.

Example 1-2. Making use of paranthesis for clarity in Python
my_value = 2 * ((3 + 2)**2 / 5) - 4

print(my_value) # prints 6.0

While both examples are technically correct, the latter one is more clear to us easily confused humans. If you or someone else makes changes to your code, the paranthesis provide an easy reference of operation order as you make changes. This provides a line of defense against code changes to prevent bugs.


If you have done some scripting with Python or another programming language, you have an idea what a variable is. In mathematics, a variable is a named placeholder for an unspecified or unknown number.

You may have a variable x representing any real number, and you can multiply that variable without declaring what it is. In Example 1-3 we take a variable input x from a user, and multiply it by 3.

Example 1-3. A variable in Python that is then multiplied
x = int(input("Please input a number

product = 3 * x


There are some standard variable names for certain variable types. If these variable names and concepts are unfamiliar, no worries! But the rest of you readers might recognize we use theta θ to denote angles and beta β for a parameter in a linear regression. Greek symbols make awkward variable names in Python, so we would likely name these variables theta and beta in Python as shown in Example 1-4.

Example 1-4. Greek variable names in Python
beta = 1.75
theta = 30.0

Note also that variable names can be subscripted so that several instances of a variable name can be used. For practical purposes, just treat these as separate variables. If you encounter variables x1, x2, and x3, just treat them as three separate variables as shown in Example 1-5.

Example 1-5. Expressing subscripted variables in Python
x1 = 3
x2 = 10
x3 = 44


Functions are expressions that define relationships between two or more variables. More specifically a function takes input variables (also called domain variables or independent variables), plugs them into an expression, and then results in an output variable (also called dependent variable).

Take this simple linear function y=2x+1. For any given x value, we solve the expression with that x to find y. When x = 1, then y = 3. When x = 2, y = 5. When x = 3, y = 7 and so on as shown in Table 1-1.

Table 1-1. Different values for y = 2x + 1
x 2x + 1 y


2(0) + 1



2(1) + 1



2(2) + 1



2(3) + 1


Functions are useful because they help predict the relationship between variables, such as how many fires y can we expect at x temperature.

Another convention you may see for the dependent variable y is to explicitly label it a function of x, such as f(x). So rather than express a function as y=2x+1 we can also express it as:


Example 1-6 shows how we can declare a mathematical function and iterate it in Python.

Example 1-6. Declaring a linear function in Python
def f(x):
    return 2 * x + 1

x_values = [0, 1, 2, 3]

for x in x_values:
    y = f(x)

When dealing with real numbers, a subtle but important feature of functions is they often have an infinite number of x values and resulting y values. Ask yourself this: how many x values can we put through the function y=2x+1? Rather than just 0, 1, 2, 3… why not 0, .5, 1, 1.5, 2, 2.5, 3 as shown in Table 1-2 below?

Table 1-2. Different values for y = 2x + 1
x 2x + 1 y


2(0) + 1



2(.5) + 1



2(1) + 1



2(1.5) + 1



2(2) + 1



2(2.5) + 1



2(3) + 1


Or why not do quarter steps for x? 110 of a step? We can make these steps infinitely small effectively showing y=2x+1 is a continuous function, where for every possible value of x there is a value for y. This segues us nicely to visualize our function as a graph as shown in Figure 1-1.

Graphing the function y = 2x + 1
Figure 1-1. Graph for function y = 2x + 1

When we plot on a two-dimensional plane with two number lines (one for each variable) it is known as a Cartesian plane, x-y plane, or coordinate plane. We trace a given x value and then look up the corresponding y value, and plot the intersections as a line. Notice that due to the nature of real numbers (or decimals, if you prefer), there are an infinite number of x values. This is why when we plot the function f(x) we get a continuous line with no breaks in it. There are an infinite number of points on that line, or any part of that line.

If you want to plot this using Python, there are a number of charting libraries from Plotly to matplotlib. However, SymPy gives us a quick, clean way to plot a function. It uses matplotlib so make sure you have that package installed, otherwise it will print an ugly text-based graph to your console. After that, just declare the x variable to SymPy using symbols(), declare your function, and then plot it as shown in Example 1-7 and Figure 1-2.

Example 1-7. Charting a linear function in Python using SymPy
from sympy import *

x = symbols('x')
f = 2*x + 1
Figure 1-2. Using SymPy to graph a linear function

Example 1-8 and Figure 1-3 are another example showing the function y=x2+1.

Example 1-8. Charting a linear function in Python using SymPy
from sympy import *

x = symbols('x')
f = x**2 + 1
Figure 1-3. Using SymPy to graph an exponential function

Notice in Figure 1-3 we do not get a straight line but rather get a smooth, symmetrical curve known as a parabola. It is continuous but is not linear, as it does not produce values in a straight line. Curvy functions like this are mathematically harder to work with, but we will learn some tricks to make it not so bad.

Note that functions can utilize multiple input variables and not just one. For example, we can have a function with independent variables x and y.


Since we have 2 independent variables (x and y) and 1 dependent variable (the output of f(x,y)) we need to plot this graph on 3 dimensions to produce a plane of values rather than a line as shown in Example 1-9 and Figure 1-4.

Example 1-9. Declaring a function with 2 independent variables in Python:
from sympy import *
from sympy.plotting import plot3d

x, y = symbols('x y')
f = 2*x + 3*y
Figure 1-4. Using SymPy to graph a 3-dimensional function

No matter how many independent variables you have, your function will typically still only output one dependent variable. When you solve for multiple dependent variables, you will likely be using separate functions for each one.


Exponents multiplies a number by itself a specified number of times. When you raise 2 to the 3rd power (expressed as 23 using 3 as a superscript), that is multiplying three 2’s together:


The base is the variable or value we are exponentiating, and the exponent is the number of times we multiply the base value. For the expression 23, 2 is the base and 3 is the exponent.

Exponents have a few interesting properties. Say we multiplied x2 and x3 together. Observe what happens below when I expand the exponents with simple multiplication, and then consolidate into a single exponent:


When we multiply exponents together with the same base, we simply add the exponents which is known as the product rule. Note that the base of all multiplied exponents must be the same for the product rule to apply.

Let’s explore division next. What happens when we divide x2 by x5 ?


As you can see above, when we divide x2 by x5 we can cancel out two x’s in numerator and denominator, leaving us with 1x3. Without tangenting into algebra rules too much, here is what happens. When a value exists in both the numerator and denominator, and there is only multiplication/division in the numerator and denominator (no addition/subtraction), we can cancel out that value.

This is a good point to introduce negative exponents, which is another way of expressing an exponent operation in the denominator of a fraction. To demonstrate, 1x3 is the same as x-3.


Tying back the product rule, we can see it applies to negative exponents too. To see this, let’s approach this problem a different way. We can express this division of two exponents by making the “5” exponent of x5 negative, and then multiplying it with x2. When you add a negative number, it is effectively performing subtraction. Therefore, the exponent product rule summing the multiplied exponenents still holds up as shown below:


Now what about fractional exponents? This may sound like something new, but they really are just an alternative way to represent roots, such as the square root. AS a brief refresher, a square root of 4 asks “what number multiplied by itself will give me 4?”, which of course is 2. Note here that 41/2 is the same as 4.


Cubed roots are similar to square roots, but they seek a number multiplied by itself 3 times to give a result. A cubed root of 8 is expressed as 83, and asks “what number multiplied by itself 3 times gives me 8”? This number would be 2 because 2*2*2=8. In exponents, a cubed root is expressed as a fractional exponent, and 83 can be re-expressed as 81/3.


To bring it back full circle, what happens when you multiply the cubed root of 8 three times? This will undo the cubed root and yield us 8. Alternatively, if we express the cubed root as fractional exponenents 81/3, it becomes clear we add the exponents together to get an exponent of 1 and that also undoes the cubed root.


And one last property: with an exponent of an exponent, that would multiply the exponents together. This is known as the power rule. So (83)2 would simplify to 86.


If you are skeptical why this is, try expanding it and you will see the sum rule makes it clear:


Lastly, what does it mean when we have a fractional exponent with a numerator other than 1, such as 823? Well that is taking the cube root of 8 and then squaring it. Take a look below:


And yes, irrational numbers can serve as exponents like 8π which is 687.2913. This may feel unintuitive, and understandably so! In the interest of time we will not dive deep into this as it requires some Calculus. But essentially, we can calculate irrational exponents by approximating with a rational number. This is effectively what computers do since they can only compute to so many decimal places anyway.

For example Pi π has an infinite number of decimal places. But if we take the first 11 digits, 3.1415926535, we can approximate Pi as a rational number 31415926535 / 10000000000. Sure enough, this gives us approximately 687.2913.



A logarithm is a math function that finds a power for a specific number and base. It may not sound interesting at first but it actually has many applications across every kind of science and engineering. From measuring earthquakes to managing volume on your stereo, the logarithm is found everywhere. It also finds its way into machine learning and data science a lot.

Start your thinking by asking “2 raised to what power gives me 8?” One way to express this mathematically is to use an x for the exponent:


We intuitively know the answer, x=3, but we need a more elegant way to express this common math operation. This is what the log() function is for.


As you can see in the logarithm expression above, we have a base 2 and are finding a power to give us 8. More generally, we can re-express a variable exponent as a logarithm:


Algebraically speaking, this is a way of isolating the x!

Example 1-10 shows how we calculate this logarithm in Python.

Example 1-10. Using the log function in Python:
from math import log

# 2 raised to what power gives me 8?
x = log(8, 2)

print(x) # prints 3.0

When you do not supply a base argument to a log() function on a computer platform, it will typically have a default base. In some fields, like earthquake measurements, the default base for the log is 10, but in data science, the default base for the log is Euler’s number e. Python uses the latter adn we will talk about it shortly.

Just like exponents, logarithms have several properties when it comes to multiplication, division, exponentiation, and so on. In the interest of time and focus, I will only talk about the multiplication property. The key idea to focus on is a logarithm finds an exponent for a given base to result in a certain number.

If you need need to dive into logarithmic properties, Table 1-3 displays exponent and logarithm behaviors side-by-side you can use for reference.

Table 1-3. Properties for Exponents and Logarithms
Operator Exponent Property Logarithm Property










Zero Exponent






Euler’s Number and Natural Logarithms

There is a special number that shows up quite a bit in math called Euler’s number e. It is transcendental just like Pi π and is approximately 2.71828.

Back in high school, my Calculus teacher demonstrated Euler’s number in several exponential problems. Finally I asked “Mr. Nowe, what is e anyway? Where does it come from?” I remember never being fully satisfied with the explanations involving rabbit populations and other natural phenomena. I hope to give a more satisfying explanation here.

Here is how I like to discover Euler’s number. Let’s say you loan $100 to somebody with 20% interest annually. Typically interest will be compounded monthly, so the interest each month would be .20/12=.01666. How much will the loan balance be after two years? To keep it simple, let’s assume the loan does not require payments (and no payments are made) until the end of those two years.

Putting together the exponent concepts we learned so far (or perhaps pulling it out a finance textbook), we can come up with a formula to calculate interest. It consists of a balance A for a starting investment P, interest rate r, time span t (number of years), and periods n (number of months in each year). Here it is as follows:


So if we were to compound interest every month, the loan would grow to $148.69 as calculated below:


If you want to do this in Python, try it out with the code in Example 1-11.

Example 1-11. Calculating continuous interest in Python
from math import exp

p = 100
r = .20
t = 2.0
n = 12

a = p * (1 + (r/n))**(n * t)

print(a) # prints 148.69146179463576

But what if we compounded interest daily? What happens then? Change n to 365.


Huh! If we compound our interest daily instead of monthly, we would earn 47.4666 cents more at the end of two years. If we got greedy why not compound every hour as shown below? Will that give us even more? There are 8760 hours in a year, so set n to that value:


Ah, we squeezed out roughly 2 cents more in interest! But are we experiencing a diminishing return? Let’s try to compound every minute! Note that there are 525,600 minutes in a year, so let’s set that value to n:


Okay we are only gaining smaller and smaller fractions of a cent the more frequently we compound. So if I keep making these periods infinitely smaller to the point of compounding continuously, where does this lead to?

Let me introduce you to Euler’s number e, which is approximately 2.71828. Here is the formula to compound “continuously,” meaning we are compounding nonstop:


Returning to our example, let’s calculate the balance of our loan after 2 years if we compounded continuously:


Okay this is not too surpising considering compounding every minute got us a balance of 149.1824584. That got us really close to our value of 149.1824698 when compounding continuously.

Typically you use e as an exponent base in Python, Excel, and other platforms using the exp() function. You will find that e is so commonly used, it is the default base for both exponent and logarithm functions.

Example 1-12 calculates continuous interest in Python using the exp() function.

Example 1-12. Calculating continuous interest in Python
from math import exp

p = 100 # principal, starting amount
r = .20 # interest rate, by year
t = 2.0 # time, number of years

a = p * exp(r*t)

print(a) # prints 149.18246976412703

So where do we derive this constant e? Compare the compounding interest formula and the continuous interest formula. They structurally look similar, but with some differences:


More technically speaking, e is the resulting value of the expression (1+1n)n as n forever gets bigger and bigger, thus approaching infinity. Try experimenting with increasingly large values for n. By making it larger and larger you will notice something:


As you make n larger, there is a diminishing return and it converges approximately on the value 2.71828, which is our value e. You will find this number e used not just in studying populations and their growth. It plays a key role in many areas of mathematics.

Later in the book, we will use it to build normal distributions and logistic regressions from scratch.


The phenomena we observed above where increasing or decreasing a value infinitely causes an output variable to approach a number (without ever reaching that number) is what we call a limit in calculus.

I want to avoid too much math symbology in this book and use intuitive Python code instead. But in math textbooks this is how you would see limits expressed for this example:


Another property of Euler’s number is its exponential function is a derivative to itself, which is convenient for exponential and logarithmic functions. We will learn about derivatives later in this chapter. In many applications where the base does not really matter, we pick the one that results in the simplest derivative, and that is Euler’s number. That is also why it is the default base in many data science functions.

Use e to Predict Event Probability Over Time

Let’s look at one more use case for e that you might find useful. Let’s say you are a manufactuer of propane tanks. Obviously you do not want the tank to leak or else that could create hazards, particularly around open flames and sparks. Testing a new tank design, your engineer reports that there is a 5% chance in a given year that it will leak.

You know this is already an unacceptably high number, but you want to know how this probability compounds over time. You now ask yourself “what is the probability of a leak happening within 2 years? 5 years? 10 years?” The more time that is exposed, would not the probability of us seeing the tank leak only get higher? Euler’s number can come to the rescue again!


The function above models the probability of an event over time, or in this case the tank leaking after T time. e again is Euler’s number, lambda λ is the failure rate across each unit of time (each year), and T is the amount of time gone by (number of years).

If we graph this function where T is our x-axis, the probability of a leak is our y-axis, and λ=.05, here is what we get in Figure 1-5:

Figure 1-5. Predicting the probability of a leak over time

Here is how we model this function in Python for λ=.05 and T=5 years in Example 1-13.

Example 1-13. Code for predicting the probability of a leak over time
from math import exp

# Probability of leak in 1 year
p_leak = .05

# number of years
t = 5

# Probability of leak within 5 years
# 0.22119921692859512
p_leak_5_years = 1.0 - exp(-p_leak * t)

print("PROBABILITY OF LEAK WITHIN 5 YEARS: {}".format(p_leak_5_years))

The probability of a tank failure after 2 years is about 9.5%, 5 years is about 22.1%, and 10 years 39.3%. The more time that passes, the more likely the tank is going to leak. We can generalize this formula to predict any event with a probability in a given period, and see how that probability exposes over different periods of time.

Natural Logarithms

When we use e as our base for a logarithm, we call it a natural logarithm. Depending on the platform, we may use ln() instead of log() to specify a natural logarithm. So rather than express a natural logarithm expressed as loge10 to find the power raised on e to get 10, we would shorthand it as ln(10).


However in Python, a natural logarithm is specified by the log() function. As said earlier, the default base for the log() function is e. Just leave the second argument for the base empty and it will default to using e as the base shown in Example 1-14.

Example 1-14. Calculating natural logarithm of 10 in Python
from math import log

# e raised to what power gives us 10?
x = log(10)

print(x) # prints 2.302585092994046

We will use e in a number of places throughout this book. Feel free to experiment with exponents and logarithms for awhile using Excel, Python,, or any other platform of your choice.


As we have seen with Euler’s number, some interesting ideas emerge when we forever increase or decrease an input variable and the output variable keeps approaching a value but never reaching it. Let’s formally explore this idea.

Take this function which is plotted in Figure 1-6:

Figure 1-6. A function that forever approaches 0 in either direction but never reaches 0

Notice that as x forever increases (or decreases for negative numbers) f(x) gets closer to 0. Fascinatingly, f(x) never actually reaches 0. It just forever keeps getting closer.

Therefore, the fate of this function is as x forever extends into infinity, it will keep getting closer to 0 but never reach 0. The way we express a value that is forever being approached, but never reached, is through a limit.


The way we read this is “as x approaches infinity, the function 1/x approaches 0 (but never reaches 0).” You will see this kind of “approach but never touch” behavior a lot especially when we dive into derivatives and integrals.

Using SymPy, we can calculate what value we approach for f(x)=1x as x approaches infinity .

Example 1-15.
from sympy import *

x = symbols('x')
f = 1 / x
result = limit(f, x, oo)

print(result) # 0

As you have seen, we discovered Euler’s number e this way too. It is the result of forever extending n into infinity for this function.


Funny enough, when we calculate Euler’s number with limits in SymPy (shown in Example 1-16), SymPy immediately recognizes it as Euler’s Number. We can call evalf() so we can actually display it as a number.

Example 1-16.
from sympy import *

n = symbols('n')
f = (1 + (1/n))**n
result = limit(f, n, oo)

print(result) # E
print(result.evalf()) # 2.71828182845905


Let’s go back to talking about functions and look at them from a Calculus perspective, starting with derivatives. A derivative tells the slope of a function, and it is useful to measure the rate of change at any point in a function.

Why do we care about derivatives? They are often used in machine learning and other mathematical algorithms, especially with gradient descent. When the slope is 0, that means we are at the minimum or maximum of an output variable. This concept will be useful later when we do linear regression, logistic regression, and neural networks.

But let’s start with a simple example. A derivative provides the slope at that given x value. Let’s take a look at the function f(x)=x2 in Figure 1-7. How “steep” is the curve at x = 2?

Figure 1-7. Observing steepness at a given part of the function

Notice above we can measure “steepness” at any point in the curve, and we can visualize this with a tangent line. Think of a tangent line as a straight line that “just touches” the curve at a given point. It also provides the slope at a given point. You can crudely estimate a tangent line at a given x value by creating a line intersecting that x value and a really close neighboring x value on the function.

Take x = 2 and a nearby value x = 2.1, which when passed to the function f(x)=x2 will yield f(2) = 4 and f(2.1) = 4.41 as shown in Figure 1-8. The resulting line that passes through these two points has a slope of 4.41.

Figure 1-8. A crude way of calculating slope

You can quickly calculate the slope m between two points using the simple rise-over-run formula:


If I made the x step between the two points even smaller, like x = 2 and x = 2.00001 which would result in f(2) = 4 and f(2.00001) = 4.00004, that would get really close to the actual slope of 4. So the smaller our step is to the neighboring value, the closer we get to the slope value at a given point in the curve. Like so many important concepts in math, we find something meaningful as we approach infinitely large or infinitely small values.

Example 1-17 shows a derivative calculator implemented in Python.

Example 1-17. A derivative calculator in Python
def derivative_x(f, x, step_size):
    m = (f(x + step_size) - f(x)) / ((x + step_size) - x)
    return m

def my_function(x):
    return x**2

slope_at_2 = derivative_x(my_function, 2, .00001)

print(slope_at_2) # prints 4.000010000000827

Now the good news is there is a cleaner way to calculate the slope anywhere on a function. The bad news is it requires some “pencil and paper” math work to hand-calculate a function into its derivative using some rules. If these handwritten derivatives make you uncomfortable, do not worry. We have already been using SymPy to plot graphs, but I will show you how it can also do “pencil and paper” work for you using the magic of symbolic computation.

When you encounter an exponential function like f(x)=x2 the derivative function will make the exponent a multiplier and then decrement the exponent by 1, leaving us with the derivative ddxx2=2x. The ddx indicates a derivative with respect to x, which says we are building a derivative targeting the x value to get its slope. So if we want to find the slope at x = 2, and we have the derivative function, we just plug in that x value to get the slope.


If you intend on learning these rules to hand-calculate derivatives, there are plenty of Calculus books for that. But there are some nice tools to calculate derivatives symbolically for you. The Python library SymPy is one that is free and open-source, and it nicely adapts to using the Python syntax. Example 1-18 shows how we calculate the derivative for f(x)=x2 on SymPy.

Example 1-18. Calculating a derivative in SymPy
from sympy import *

# Declare 'x' to SymPy
x = symbols('x')

# Now just use Python syntax to declare function
f = x**2

# Calculate the derivative of the function
dx_f = diff(f)
print(dx_f) # prints 2*x

Wow! So by declaring variables using the symbols() function in SymPy, I can then proceed to use normal Python syntax to declare my function. After that I can use diff() to calculate the derivative function for me. In Example 1-19 can then take our derivative function back to plain Python and simply declare it as another function.

Example 1-19. A derivative calculator in Python
def f(x):
    return x**2

def dx_f(x):
    return 2*x

slope_at_2 = dx_f(2.0)

print(slope_at_2) # prints 4.0

If you wanted to keep using SymPy, you can call the subs() function to swap the x variable with the value 2 as shown in Example 1-20.

Example 1-20.
# Calculate the slope at x = 2
print(dx_f.subs(x,2)) # prints 4

Another concept we will encounter in this book is partial derivatives, which are derivatives on functions that have multiple input variables. Let’s take the function f(x,y)=2x3+3y3. The x and y variable each get their own derivatives ddx and ddy. These represent the slope values with respect to each variable on a multidimensional surface. These are the derivatives for x and y, followed by the SymPy code to calculate those derivatives:


Example 1-21 and Figure 1-9 shows how we calculate the partial derivatives for x and y respectively with SymPy.

Example 1-21.
from sympy import *
from sympy.plotting import plot3d

# Declare x and y to SymPy
x,y = symbols('x y')

# Now just use Python syntax to declare function
f = 2*x**3 + 3*y**3

# Calculate the partial derivatives for x and y
dx_f = diff(f, x)
dy_f = diff(f, y)

print(dx_f) # prints 6*x**2
print(dy_f) # prints 9*y**2

# plot the function
Figure 1-9. Plotting a 3-dimensional exponential function

Think of it this way. Rather than finding the slope on a 1-dimensional function, we have slopes with respect to multiple variables in several directions. For each given variable derivative, we assume the other variables are held constant. Take a look at the 3D graph of f(x,y)=2x3+3y3 above and you will see we have slopes in two directions for two variables.

So for (x,y) values (1,2), the slope with respect to x is 6(1)=4 and the slope with respect to y is 9(2)2=36.


The opposite of a derivative is an integral, which finds the area under the curve for a given range. We will find ourselves using integrals quite a bit in this book to find areas under probability distributions.

I want to take an intuitive approach for learning integrals called the Reimann Sums, one that flexibly adapts to any function you please.

First, let’s point out that finding the area for a range under a straight line is easy. Let’s say I have a function f(x)=2x and I want to find the area under the line between 0 and 1, as shaded in Figure 1-10.

Figure 1-10. Calculating an area under a linear function

Notice that I am finding the area bounded between the line and the x-axis, and in the x range 0 to 1.0. If you recall basic geometry formulas, the area A for a triangle is A=12b*h where b is the length of the base and h is the height. We can visually spot above that b=1 and h=2. So plugging into the formula we get for our area 1.0 as calculated below:


That was not bad, right? But let’s look at a function that is difficult to find the area under: f(x)=x2+1. What is the area between 0 and 1 as shaded in Figure 1-11?

Figure 1-11. Calculating area under nonlinear functions is less straightforward

Again we are only interested in the area below the curve and above the x-axis, only within the x range between 0 and 1. The curviness here does not give us a clean geometric formula to find the area, but here is a clever little hack you can do.

What if we packed 5 rectangles of equal length under the curve as shown in Figure 1-12, where the height of each one extends from the x-axis to where the midpoint touches the curve?

Figure 1-12. Packing rectangles under a curve to approximate area

The area of a rectangle is A = length * width, so we could easily sum the areas of the rectangles. Would that give us a good approximation of the area under the curve? What if we packed 100 rectangles? 1000? 100,000? As we increase the number of rectangles while decreasing their width, would we not get closer to the area under the curve? Yes we would, and it is yet another case where we increase/decrease something towards infinity to approach an actual value.

Let’s try it out in Python! First we need a function that approximates an integral which we will call approximate_integral(). The arguments a and b will specify the min and max of the x range respectively. n will be the number of rectangles to pack, and f will be the function we are integrating. We implement the function in Example 1-24, and then use it to integrate our function f(x)=x2+1 with 5 rectangles, between 0.0 and 1.0.

Example 1-24. An integral approximation in Python
def approximate_integral(a, b, n, f):
    delta_x = (b - a) / n
    total_sum = 0

    for i in range(1, n + 1):
        midpoint = 0.5 * (2 * a + delta_x * (2 * i - 1))
        total_sum += f(midpoint)

    return total_sum * delta_x

def my_function(x):
    return x**2 + 1

area = approximate_integral(a=0, b=1, n=5, f=my_function)

print(area) # prints 1.33

So we get an area of 1.33. What happens if we use 1000 rectangles? Let’s try it out in Example 1-25.

Example 1-25. An integral approximation in Python
area = approximate_integral(a=0, b=1, n=1000, f=my_function)

print(area) # prints 1.333333250000001

Okay we are getting some more precision here, and getting some more decimal places. What about 1 million rectangles as shown in Example 1-26?

Example 1-26. An integral approximation in Python
area = approximate_integral(a=0, b=1, n=1_000_000, f=my_function)

print(area) # prints 1.3333333333332733

Okay I think we are getting a diminishing return here, and are converging on the value 1.333¯ where the “.333” part is forever recurring. If this were a rational number, it is likely 43=1.333¯. As we increase the number of rectangles, the approximation starts to reach its limit at smaller and smaller decimals.

Now that we got some intuition on what we are trying to achieve and why, let’s do a more exact approach with SymPy which happens to support rational numbers in Example 1-27.

Example 1-27.
from sympy import *

# Declare 'x' to SymPy
x = symbols('x')

# Now just use Python syntax to declare function
f = x**2 + 1

# Calculate the integral of the function with respect to x
# for the area between x = 0 and 1
area = integrate(f, (x, 0, 1))

print(area) # prints 4/3

Cool! So the area actually is 4/3 which is what our rectangle approach increasingly approached the more rectangles we had. Unfortunately plain Python (and many programming languages) only support decimals but computer algebra systems like SymPy give us exact rational numbers.


In this chapter we covered some foundations we will use for the rest of this book. From number theory to logarithms and calculus integrals, we cherrypicked some important mathematical concepts that have relevance in data science, machine learning, and analytics. You may have questions about why these concepts are useful. That will come next!

Before we move on to discuss probability, take a little bit of time to skim these concepts one more time and then do the following exercise. However you can always revisit this chapter as you progress through this book, and refresh as necessary when you start applying these mathematical ideas.


  1. Is the value 62.6738 rational or irrational? Why or why not?

  2. Evaluate the expression: 10710-5

  3. Evaluate the expression: 8112

  4. Evaluate the expression: 2532

  5. Assuming no payments are made, how much would a $1000 loan be worth at 5% interest compounded monthly after 3 years?

  6. Assuming no payments are made, how much would a $1000 loan be worth at 5% interest compounded continuously after 3 years?

  7. You bought a 3D printer and and read there is a 3% probability the motor will jam within 1 month. What is the probability it will jam within 6 months?

  8. For the function f(x)=3x2+1 what is the slope at x = 3?

  9. For the function f(x)=3x2+1 what is the area under the curve for x between 0 and 2?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.