Using the NumPy data structures and methods

The term NumPy is short for Numerical Python; the library name is numpy. The library provides arrays with much more efficient storage and faster work than basic lists and dictionaries. Unlike basic lists, numpy arrays must have elements of a single data type. The following code imports the numpy package with the alias np. Then it checks the version of the library. Then the code creates two one-dimensional arrays from two lists, one with an implicit element data type integer, and one with an explicit float data type:

import numpy as np 
np.__version__ 
np.array([1, 2, 3, 4]) 
np.array([1, 2, 3, 4], dtype = "float32") 

You can create multi-dimensional arrays as well. The following code creates three arrays with three rows and five columns, one filled with zeros, one with ones, and one with the number pi. Note the functions used for populating arrays:

np.zeros((3, 5), dtype = int) 
np.ones((3, 5), dtype = int) 
np.full((3, 5), 3.14) 

For the sake of brevity, I am showing only the last array here:

    array([[ 3.14,  3.14,  3.14,  3.14,  3.14],
           [ 3.14,  3.14,  3.14,  3.14,  3.14],
           [ 3.14,  3.14,  3.14,  3.14,  3.14]])
  

There are many additional functions that will help you in populating your arrays. The following code creates four different arrays. The first line creates a linear sequence of numbers between 0 and 20 with step 2 returning every second number. Note that the upper bound 20 is not included in the array. The second line creates uniformly distributed numbers between 0 and 1. The third line creates 10 numbers between standard normal distribution with mean 0 and standard deviation 1. The third line creates a 3 by using the 3 matrix of uniformly distributed integral numbers between 0 and 9:

np.arange(0, 20, 2) 
np.random.random((1, 10)) 
np.random.normal(0, 1, (1, 10)) 
np.random.randint(0, 10, (3, 3)) 

Again, for the sake of brevity, I am showing only the last result here:

    array([[0, 1, 7],
           [5, 9, 4],
           [5, 5, 6]])
  

Arrays have many attributes. We will check all of them. The following code shows an example of how to get the dimensionality and the shape of an array. The result shows you that you created a two-dimensional array with three rows and four columns:

arr1 = np.random.randint(0, 12, size = (3, 4)) 
arr1.ndim 
arr1.shape 

You can access array elements and their position with zero-based indexes. You can also use negative indexes, meaning counting elements backwards. The following code illustrates this:

arr1 
arr1[1, 2] 
arr1[0, -1] 

The result of the preceding code is:

    array([[ 6,  8,  8, 10],
           [ 1,  6,  7,  7],
           [ 8,  1,  5,  9]])
    7
    10
  

First, the code lists the array created in the previous example, the two-dimensional array with three rows and four columns with random integers. Then the code reads the third element of the second row. Then the code retrieves the last element from the first row.

Besides retrieving a single element, you can also slice an array. In the next example, the first line retrieves the second row of the array arr1, and the second line the second column:

arr1[1, :] 
arr1[:, 1] 

Here are the results of the preceding code:

    array([1, 6, 7, 7])
    array([8, 6, 1])
  

You can concatenate or stack the arrays. The following code creates three arrays, and then concatenates the first two arrays along the rows, and the first and the third along the columns:

a1 = np.array([[1, 2, 3], 
               [4, 5, 6]]) 
a2 = np.array([[7, 8, 9], 
               [10, 11, 12]]) 
a3 = np.array([[10], 
               [11]]) 
np.concatenate([a1, a2], axis = 0) 
np.concatenate([a1, a3], axis = 1) 

Here is the result:

    array([[ 1,  2,  3],
           [ 4,  5,  6],
           [ 7,  8,  9],
           [10, 11, 12]])
    array([[ 1,  2,  3, 10],
           [ 4,  5,  6, 11]])
  

You can also use the np.vstack() and np.hstack() functions to stack the arrays vertically or horizontally, over the first or over the second axis. The following code produces the same result as the previous code, which uses the np.concatenate() function:

np.vstack([a1, a2]) 
np.hstack([a1, a3]) 

You can imagine that you can do this stacking only with conformable arrays. The following two lines would produce errors:

np.vstack([a1, a3]) 
np.hstack([a1, a2]) 

In order to perform some calculations on array elements, you could use mathematical functions from the default Python engine, and operate in loops, element by element. However, the numpy library also includes vectorized versions of the functions, which operate on vectors and matrices as a whole, and are much faster than the basic ones. The following code creates a 3 by 3 array of numbers between 0 and 8, shows the array, and then calculates the sinus of each element using a numpy vectorized function:

x = np.arange(0, 9).reshape((3, 3)) 
x 
np.sin(x) 

And here is the result:

    array([[0, 1, 2],
           [3, 4, 5],
           [6, 7, 8]])
    array([[ 0.        ,  0.84147098,  0.90929743],
           [ 0.14112001, -0.7568025 , -0.95892427],
           [-0.2794155 ,  0.6569866 ,  0.98935825]])
  

Numpy also includes vectorized aggregate functions. You can use them for a quick overview of the data in an array, using the descriptive statistics calculations. The following code initializes an array of five sequential numbers:

x = np.arange(1,6) 
x 

Here is the array:

    array([1, 2, 3, 4, 5])
  

Now you can calculate the sum and the product of the elements, the minimum and the maximum, the mean, and the standard deviation:

np.sum(x), np.prod(x) 
np.min(x), np.max(x) 
np.mean(x), np.std(x) 

Here are the results:

    (15, 120)
     (1, 5)
    (3.0, 1.4142135623730951)
  

In addition, you can also calculate running aggregates, like a running sum in the following code:

np.add.accumulate(x) 

The running sum result is here:

    array([ 1,  3,  6, 10, 15], dtype=int32)
  

There are many more operations on arrays available in the numpy module. However, I am switching to the next topic of data science on this Python learning tour, to the pandas library.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.21.239