Benefits and characteristics of NumPy arrays

NumPy arrays have several advantages over Python lists. These benefits are focused on providing high-performance manipulation of sequences of homogenous data items. Several of these benefits are as follows:

  • Contiguous allocation in memory
  • Vectorized operations
  • Boolean selection
  • Sliceability

Contiguous allocation in memory provides benefits in performance by ensuring that all elements of an array are directly accessible at a fixed offset from the beginning of the array. This also is a computer organization technique that facilities providing vectorized operations across arrays.

Vectorized operation is a technique of applying an operation across all or a subset of elements without explicit coding of loops. Vectorized operations are often orders of magnitude more efficient in execution as compared to loops implemented in a higher-level language. They are also excellent for reducing the amount of code that needs to be written, which also helps in minimizing coding errors.

To demonstrate both of these benefits, the following example calculates the time required by the for loop in Python to square a list consisting of 100,000 sequential integers:

In [2]:
   # a function that squares all the values
   # in a sequence
   def squares(values):
       result = []
       for v in values:
           result.append(v * v)
       return result
   
   # create 100,000 numbers using python range
   to_square = range(100000)
   # time how long it takes to repeatedly square them all
   %timeit squares(to_square)

   100 loops, best of 3: 14 ms per loop

Using NumPy and vectorized arrays, the example can be rewritten as follows.

In [3]:
   # now lets do this with a numpy array
   array_to_square = np.arange(0, 100000)
   # and time using a vectorized operation
   %timeit array_to_square ** 2

   10000 loops, best of 3: 77.4 µs per loop

Vectorization of the operation made our code simpler and also performed roughly 158 times faster!

This brings up something to keep in mind when working with data in NumPy and pandas: if you find yourself coding a loop to iterate across elements of a NumPy array, or a pandas Series or DataFrame, then you are, as they say, doing it wrong. Always keep in mind to write code that makes use of vectorization. It is almost always faster, as well as more elegantly expressed in a vectorized manner.

Boolean selection is a common pattern that we will see with NumPy and pandas where selection of elements from an array is based on specific logical criteria. This consists of calculating an array of Boolean values where True represents that the given item should be in the result set. This array can then be used to efficiently select the matching items.

Sliceability provides the programmer with a very efficient means to specify multiple elements in an array using a convenient notation. Slicing becomes invaluable when working with data in an ad hoc manner. The slicing process also benefits from being able to take advantage of the contiguous memory allocation of arrays to optimize access to series of items.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.137.41.205