Using NumPy functions in Jupyter

NumPy is a package in Python providing multidimensional arrays and routines for array processing. We bring in the NumPy package using import * from numpy statement. In particular, the NumPy package defines the array keyword, referencing a NumPy object with extensive functionality.

The NumPy array processing functions run from the mundane, such as min() and max() functions (which provide the minimum and maximum values over the array dimensions provided), to more interesting utility functions for producing histograms and calculating correlations using the elements of a data frame.

With NumPy, you can manipulate arrays in many ways. For example, we will go over some of these functions with the following scripts, where we will use NumPy to:

  • Create an array
  • Calculate the max value in the array
  • Calculate the min value in the array
  • Determine the sum across the second axis
# numpy arrays
import numpy as np

# create an array 'a' with 3 3-tuples
a = np.array([[1, 1, 2], [3, 5, 8], [13, 21, 34]])
print("Array contents", a)

# determine the minimum value in array
print("max value = ", a.max())

# max value in array
print("min value = ", a.min())

# sum across the 2nd axis  
print("sum across 2nd axis", a.sum(axis = 1))

If we transfer this script into a Python notebook, we see a display like the following when we execute the cell:

We can use the use the following script to work over arrays with the more interesting histogram and correlate functions:

import numpy as np
import random

# build up 2 sets of random numbers

# setup empty array 2 columns, 1000 rows
numbers = np.empty([2,1000], int)

# set seed so we can repeat results
random.seed(137)

# populate the array
for num in range(0, 1000):
    numbers[0,num] = random.randint(0, 1000)
    numbers[1,num] = random.randint(0, 1000)

# produce a histogram of the data
(hist, bins) = np.histogram(numbers, bins = 10, range = (0,1000))
print ("Histogram is ",hist)

# calculate correlation between the 2 columns

corrs = np.correlate(numbers[:,1], numbers[:,2], mode='valid')
print ("Correlation of the two rows is ", corrs)  

In this script, we are:

  • Populating a two-column array with random numbers
  • Producing a histogram of the values from both columns within 100 point ranges
  • And, finally, determining the correlation between the two columns (which should be a very high correlation)

After entering this script into a Jupyter Notebook and executing the cell, we have an output as follows. It makes sense that the buckets are very close in size:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.214.215