Chapter 3. Using NumPy Arrays

The beauty of NumPy Arrays is that you can use array indexing and slicing to quickly access your data or perform a computation while keeping the efficiency as the C arrays. There are also plenty of mathematical operations that are supported. In this chapter, we will take an in-depth look at using NumPy Arrays. After this chapter, you will feel comfortable using NumPy Arrays and the bulk of their functionality.

Here is a list of topics that will be covered in this chapter:

  • Basic operations and the attributes of NumPy Arrays
  • Universal functions (ufuncs) and helper functions
  • Broadcasting rules and shape manipulation
  • Masking NumPy Arrays

Vectorized operations

All NumPy operations are vectorized, where you apply operations to the whole array instead of on each element individually. This is not just neat and handy but also improves the performance of computation compared to using loops. In this section, we will experience the power of NumPy vectorized operations. A key idea worth keeping in mind before we start exploring this subject is to always think of entire sets of arrays instead of each element; this will help you enjoy learning about NumPy Arrays and their performance. Let's start by doing some simple calculations with scalars and between NumPy Arrays:

In [1]: import numpy as np 
In [2]: x = np.array([1, 2, 3, 4]) 
In [3]: x + 1 
Out[3]: array([2, 3, 4, 5]) 

All the elements in the array are added by 1 simultaneously. This is very different from Python or most other programming languages. The elements in a NumPy Array all have the same dtype; in the preceding example, this is numpy.int (this is either 32 or 64-bit depending on the machine); therefore, NumPy can save time on checking the type of each element at runtime, which, ordinarily, is done by Python. So, just apply these arithmetic operations:

In [4]: y = np.array([-1, 2, 3, 0]) 
In [5]: x * y 
Out[5]: array([-1,  4,  9,  0]) 

Two NumPy Arrays are multiplied element by element. In the preceding example, two arrays are of equal shape, so no broadcasting is applied here (we will explain different shapes, NumPy Array operations, and broadcasting rules in a later section.) The first element in array x is multiplied by the first element in array y and so on. One important point to note here is that the arithmetic operations between two NumPy Arrays are not matrix multiplications. The result still returns the same shape of NumPy Arrays. A matrix multiplication in NumPy will use numpy.dot(). Take a look at this example:

In [6]: np.dot(x, y) 
Out[6]: 12 

NumPy also supports logic comparison between two arrays, and the comparison is vectorized as well. The result returns a Boolean, and NumPy Array indicates which element in both arrays is equal. If two different shapes of arrays are compared, the result would only return one False, which indicates that the two arrays are different, and would really compare each element:

In [7]: x == y 
Out[7]: array([False,  True,  True, False], dtype=bool) 

From the preceding examples, we get an insight into NumPy's element-wise operations, but what's the benefit of using them? How can we know that an optimization has been made through these NumPy operations? We will use the %timeit function in IPython, which was introduced in the last chapter, to show you the difference between NumPy operations and the Python for loop:

In [8]: x = np.arange(10000) 
In [9]: %timeit x + 1 
100000 loops, best of 3: 12.6 µs per loop 
In [10]: y = range(10000) 
In [11]: %timeit [i + 1 for i in y] 
1000 loops, best of 3: 458 µs per loop 

Two variables, and y, are the same length and do the same kind of work, which includes adding a value to all the elements in the arrays. With the help of NumPy operations, the performance is way faster than an ordinary Python for loop (we use a list comprehension here for neat code, which is faster than an ordinary Python for loop, but still, NumPy has better performance when compared to the ordinary Python for loop). Knowing this huge distinction can help you speed up your code by replacing your loops with NumPy operations.

As we mentioned in the previous examples, improvement in performance is due to a consistent dtype in a NumPy Array. A tip that can help you use NumPy Arrays correctly is to always consider dtype before you apply any operation, as you will most likely be doing in most programming languages. The following example will show you a hugely different result with the same operation, but this is based on a different dtype array:

In [12]: x = np.arange(1,9) 
In [13]: x.dtype 
Out[13]: dtype('int32') 
In [14]: x = x / 10.0 
In [15]: x 
Out[15]: array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8]) 
In [16]: x.dtype 
Out[16]: dtype('float64') 
In [17]: y = np.arange(1,9) 
In [18]: y /= 10.0 
In [19]: y 
Out[19]: array([0, 0, 0, 0, 0, 0, 0, 0]) 
In [20]: y.dtype 
Out[20]: dtype('int32') 

The two variables x and y are exactly the same: both are numpy.int32 Arrays, ranging from 1 to 8 (you might get numpy.int64 if you use a 64-bit machine) and are divided by float 10.0. However, when x is divided by a float, a new NumPy Array is created with dtype = numpy.float64. This is a totally new array but has the same variable name,x, so dtype is changed in x. On the other hand,y uses the/= sign, which always honors the dtype value of the y array. So, when it is divided by 10.0, no new array is created; only the value in the element of y is changed but dtype is still numpy.int32. This is why x and y end up with two different arrays. Note that, from version of 1.10, NumPy will not allow you to cast the float result as an integer; therefore, TypeError will have to be raised.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.19.111