Accessing arrays

The NumPy array interface is, on a shallow level, similar to that of Python lists. NumPy arrays can be indexed using integers and iterated using a for loop:

    A = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8]) 
    A[0] 
    # Result:
    # 0 

    [a for a in A] 
    # Result:
    # [0, 1, 2, 3, 4, 5, 6, 7, 8]

In NumPy, array elements and sub-arrays can be conveniently accessed by using multiple values separated by commas inside the subscript operator, []. If we take a (3,3) array (an array containing three triplets), and we access the element with index 0, we obtain the first row, as follows:

    A = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) 
    A[0] 
    # Result:
    # array([0, 1, 2])

We can index the row again by adding another index separated by a comma. To get the second element of the first row, we can use the (0, 1) index. An important observation is that the A[0, 1] notation is actually a shorthand for A[(0, 1)], that is, we are actually indexing using a tuple! Both the versions are shown in the following snippet:

    A[0, 1] 
    # Result:
    # 1

    # Equivalent version using tuple
    A[(0, 1)]

NumPy allows you to slice arrays into multiple dimensions. If we slice on the first dimension, we can obtain a collection of triplets, shown as follows:

    A[0:2] 
    # Result:
    # array([[0, 1, 2], 
    #        [3, 4, 5]])

If we slice the array again on the second dimension with 0:2, we are basically extracting the first two elements from the collection of triplets shown earlier. This results in an array of shape (2, 2), shown in the following code:

    A[0:2, 0:2] 
    # Result:
    # array([[0, 1], 
    #        [3, 4]])

Intuitively, you can update the values in the array using both numerical indexes and slices. An example is illustrated in the following code snippet:

    A[0, 1] = 8 
    A[0:2, 0:2] = [[1, 1], [1, 1]]

Indexing with the slicing syntax is very fast because, unlike lists, it doesn't produce a copy of the array. In NumPy's terminology, it returns a view of the same memory area. If we take a slice of the original array, and then we change one of its values, the original array will be updated as well. The following code illustrates an example of this feature:

    a= np.array([1, 1, 1, 1]) 
    a_view = a[0:2] 
    a_view[0] = 2 
    print(a) 
    # Output:
    # [2 1 1 1]

It is important to be extra careful when mutating NumPy arrays. Since views share data, changing the values of a view can result in hard-to-find bugs. To prevent side effects, you can set the a.flags.writeable = False flag, which will prevent accidental mutation of the array or any of its views.

We can take a look at another example that shows how the slicing syntax can be used in a real-world setting. We define an r_i array, shown in the following line of code, which contains a set of 10 coordinates (x, y). Its shape will be (10, 2):

    r_i = np.random.rand(10, 2)

If you have a hard time distinguishing arrays that differ in the axes order, for example between an a array of shape (10, 2) and (2, 10), it is useful to think that every time you say the word of, you should introduce a new dimension. An array with ten elements of size two will be (10, 2). Conversely, an array with two elements of size ten will be (2, 10).

A typical operation we may be interested in is the extraction of the x component from each coordinate. In other words, you want to extract the (0, 0), (1, 0), (2, 0), and so on items, resulting in an array with shape (10,). It is helpful to think that the first index is moving while the second one is fixed (at 0). With this in mind, we will slice every index on the first axis (the moving one) and take the first element (the fixed one) on the second axis, as shown in the following line of code:

    x_i = r_i[:, 0]

On the other hand, the following expression will keep the first index fixed and the second index moving, returning the first (x, y) coordinate:

    r_0 = r_i[0, :]

Slicing all the indexes over the last axis is optional; using r_i[0] has the same effect as r_i[0, :].

NumPy allows you to index an array using another NumPy array made of either integer or Boolean values--a feature called fancy indexing.

If you index an array (say, a) with another array of integers (say, idx), NumPy will interpret the integers as indexes and will return an array containing their corresponding values. If we index an array containing 10 elements with np.array([0, 2, 3]), we obtain an array of shape (3,) containing the elements at positions 0, 2, and 3. The following code gives us an illustration of this concept:

    a = np.array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0]) 
    idx = np.array([0, 2, 3]) 
    a[idx] 
    # Result:
    # array([9, 7, 6])

You can use fancy indexing on multiple dimensions by passing an array for each dimension. If we want to extract the (0, 2) and (1, 3) elements, we have to pack all the indexes acting on the first axis in one array, and the ones acting on the second axis in another. This can be seen in the following code:

    a = np.array([[0, 1, 2], [3, 4, 5], 
                  [6, 7, 8], [9, 10, 11]]) 
    idx1 = np.array([0, 1]) 
    idx2 = np.array([2, 3]) 
    a[idx1, idx2]

You can also use normal lists as index arrays, but not tuples. For example, the following two statements are equivalent:

    a[np.array([0, 1])] # is equivalent to
    a[[0, 1]]

However, if you use a tuple, NumPy will interpret the following statement as an index on multiple dimensions:

    a[(0, 1)] # is equivalent to
    a[0, 1]

The index arrays are not required to be one-dimensional; we can extract elements from the original array in any shape. For example, we can select elements from the original array to form a (2,2) array, as shown:

    idx1 = [[0, 1], [3, 2]] 
    idx2 = [[0, 2], [1, 1]] 
    a[idx1, idx2] 
    # Output: 
    # array([[ 0,  5],
    #        [10,  7]])

The array slicing and fancy-indexing features can be combined. This is useful, for instance, when we want to swap the x and y columns in a coordinate array. In the following code, the first index will be running over all the elements (a slice) and, for each of those, we extract the element in position 1 (the y) first and then the one in position 0 (the x):

    r_i = np.random(10, 2) 
    r_i[:, [0, 1]] = r_i[:, [1, 0]]

When the index array is of the bool type, the rules are slightly different. The bool array will act like a mask; every element corresponding to True will be extracted and put in the output array. This procedure is shown in the following code:

    a = np.array([0, 1, 2, 3, 4, 5]) 
    mask = np.array([True, False, True, False, False, False]) 
    a[mask] 
    # Output:
    # array([0, 2])

The same rules apply when dealing with multiple dimensions. Furthermore, if the index array has the same shape as the original array, the elements corresponding to True will be selected and put in the resulting array.

Indexing in NumPy is a reasonably fast operation. Anyway, when speed is critical, you can use the slightly faster numpy.take and numpy.compress functions to squeeze out a little more performance. The first argument of numpy.take is the array we want to operate on, and the second is the list of indexes we want to extract. The last argument is axis; if not provided, the indexes will act on the flattened array; otherwise, they will act along the specified axis:

    r_i = np.random(100, 2) 
    idx = np.arange(50) # integers 0 to 50 

    %timeit np.take(r_i, idx, axis=0) 
    1000000 loops, best of 3: 962 ns per loop 

    %timeit r_i[idx] 
    100000 loops, best of 3: 3.09 us per loop

The similar, but faster version for Boolean arrays is numpy.compress, which works in the same way. The use of numpy.compress is shown as follows:

    In [51]: idx = np.ones(100, dtype='bool') # all True values 
    In [52]: %timeit np.compress(idx, r_i, axis=0) 
    1000000 loops, best of 3: 1.65 us per loop 
    In [53]: %timeit r_i[idx] 
    100000 loops, best of 3: 5.47 us per loop

Table of Contents for Accessing arrays

Create new playlist

Sign In

Sign Up

Table of Contents for
Accessing arrays