Creating NumPy arrays and performing basic array operations

A NumPy array can be created using multiple techniques. The following code creates a new NumPy array object from a Python list:

In [4]:
   # a simple array
   a1 = np.array([1, 2, 3, 4, 5])
   a1

Out[4]:
   array([1, 2, 3, 4, 5])

In [5]:
   # what is its type?
   type(a1)

Out[5]:
   numpy.ndarray

In [6]:
   # how many elements?
   np.size(a1)

Out[6]:
   5

In NumPy, n-dimensional arrays are denoted as ndarray, and this one contains five elements, as is reported by the np.size() function.

NumPy arrays must have all of their elements of the same type. If you specify different types in the list, NumPy will try to coerce all the items to the same type. The following code example demonstrates using integer and floating-point values to initialize the array, which are then converted to floating-point numbers by NumPy:

In [7]:
   # any float in the sequences makes
   # it an array of floats
   a2 = np.array([1, 2, 3, 4.0, 5.0])
   a2

Out[7]:
   array([ 1.,  2.,  3.,  4.,  5.])

In [8]:
   # array is all of one type (float64 in this case)
   a2.dtype

Out[8]:
   dtype('float64')

The types of the items in an array can be checked with the dtype property, which in this example shows that NumPy converted all the items to float64.

An array of a specific size can be created in multiple ways. The following code uses a single item Python list to initialize an array of 10 items:

In [9]:
   # shorthand to repeat a sequence 10 times
   a3 = np.array([0]*10)
   a3

Out[9]:
   array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

An array can also be initialized with sequential values using the Python range() function. The following code initializes with ten items from 0 through 9:

In [10]:
   # convert a python range to numpy array
   np.array(range(10))

Out[10]:
   array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Since the last two examples use a Python list, this is not the most efficient manner to allocate the array. To efficiently create an array of a specific size that is initialized with zeros, use the np.zeros() function as shown in the following code:

In [11]:
   # create a numpy array of 10 0.0's
   np.zeros(10)

Out[11]:
   array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

The default is to create floating-point numbers. This can be changed to integers using the dtype parameter, as shown in the following example:

In [12]:
   # force it to be of int instead of float64
   np.zeros(10, dtype=int)

Out[12]:
   array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

NumPy provides the np.arange() function to create a NumPy array consisting of sequential values from a specified start value up to, but not including, the specified end value:

In [13]:
   # make "a range" starting at 0 and with 10 values
   np.arange(0, 10)

Out[13]:
   array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

A step value can also be provided to np.arange(). The following example demonstrates the generation of even numbers between 0 and 10, and also another allocation of an array of decreasing values by specifying a step of -1:

In [14]:
   # 0 <= x < 10 increment by two
   np.arange(0, 10, 2)

Out[14]:
   array([0, 2, 4, 6, 8])

In [15]:
   # 10 >= x > 0, counting down
   np.arange(10, 0, -1)

Out[15]:
   array([10,  9,  8,  7,  6,  5,  4,  3,  2,  1])

The np.linspace() function is similar to np.arange(), but generates an array of a specific number of items between the specified start and stop values:

In [16]:
# evenly spaced #'s between two intervals
np.linspace(0, 10, 11)

Out[16]:
   array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9., 10.])

Note

Note that the datatype of the array by default is float, and that the start and end values are inclusive.

NumPy arrays will vectorize many mathematical operators. The following example creates a 10-element array and then multiplies each element by a constant:

In [17]:
   # multiply numpy array by 2
   a1 = np.arange(0, 10)
   a1 * 2

Out[17]:
   array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

It is also possible to apply a mathematical operator across two arrays:

In [18]:
   # add two numpy arrays
   a2 = np.arange(10, 20)
   a1 + a2

Out[18]:
   array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

NumPy arrays are n-dimensional, but for purposes of pandas, we will be most interested in one- and two-dimensional arrays. This is because the pandas Series and DataFrame objects operate similarly to one-and two-dimensional arrays, respectively.

To create a two-dimensional NumPy array, you can pass in a list of lists as shown in the following example:

In [19]:
   # create a 2-dimensional array (2x2)
   np.array([[1,2], [3,4]])

Out[19]:
   array([[1, 2],
          [3, 4]])

A more convenient and efficient means is to use the NumPy array's .reshape() method to reorganize a one-dimensional array into two dimensions.

In [20]:
   # create a 1x20 array, and reshape to a 5x4 2d-array
   m = np.arange(0, 20).reshape(5, 4)
   m

Out[20]:
   array([[ 0,  1,  2,  3],
          [ 4,  5,  6,  7],
          [ 8,  9, 10, 11],
          [12, 13, 14, 15],
          [16, 17, 18, 19]])

As we have seen, the number of items in an array can be determined by the np.size() function. As the next example demonstrates, for a two-dimensional array, this will return the product of all of the dimensions of the array, which will be equivalent to the total number of items it contains:

In [21]:
   # size of any dimensional array is the # of elements
   np.size(m)

Out[21]:
   20

To determine the number of rows in a two-dimensional array, we can pass 0 as another parameter:

In [22]:
   # can ask the size along a given axis (0 is rows)
   np.size(m, 0)

Out[22]:
   5

To determine the number of columns in a two-dimensional array, we can pass the value 1:

In [23]:
   # and 1 is the columns
   np.size(m, 1)

Out[23]:
   4
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.249.198