NumPy arrays have many functions that can be applied to the arrays. Many of these are statistical functions that you can use for data analysis. The following example describes several of the useful functions.
The .min()
and .max()
methods return the minimum and maximum values in an array. The .argmax()
and .argmin()
functions return the position of the maximum or minimum value in the array:
In [82]: # demonstrate some of the properties of NumPy arrays m = np.arange(10, 19).reshape(3, 3) print (a) print ("{0} min of the entire matrix".format(m.min())) print ("{0} max of entire matrix".format(m.max())) print ("{0} position of the min value".format(m.argmin())) print ("{0} position of the max value".format(m.argmax())) print ("{0} mins down each column".format(m.min(axis = 0))) print ("{0} mins across each row".format(m.min(axis = 1))) print ("{0} maxs down each column".format(m.max(axis = 0))) print ("{0} maxs across each row".format(m.max(axis = 1))) [[ 0 1 2] [ 3 4 5] [ 6 7 8] [ 9 10 11]] 10 min of the entire matrix 18 max of entire matrix 0 position of the min value 8 position of the max value [10 11 12] mins down each column [10 13 16] mins across each row [16 17 18] maxs down each column [12 15 18] maxs across each row
The .mean()
, .std()
, and .var()
methods compute the mathematical mean, standard deviation, and variance of the values in an array:
In [83]: # demonstrate included statistical methods a = np.arange(10) a Out[83]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [84]: a.mean(), a.std(), a.var() Out[84]: (4.5, 2.8722813232690143, 8.25)
The sum and products of all the elements in an array can be computed with the .sum()
and .prod()
methods:
In [85]: # demonstrate sum and prod a = np.arange(1, 6) a Out[85]: array([1, 2, 3, 4, 5]) In [86]: a.sum(), a.prod() Out[86]: (15, 120)
The cumulative sum and products can be computed with the .cumsum()
and .cumprod()
methods:
In [87]: a # and cumulative sum and prod a.cumsum(), a.cumprod() Out[87]: (array([ 1, 3, 6, 10, 15]), array([ 1, 2, 6, 24, 120]))
The .all()
method returns True
if all elements of an array are true, and .any()
returns True
if any element of the array is true.
In [88]: # applying logical operators a = np.arange(10) (a < 5).any() # any < 5? Out[88]: True In [89]: (a < 5).all() # all < 5? (a < 5).any() # any < 5? Out[89]: False
The .size
property returns the number of elements in the array across all dimensions:
In [90]: # size is always the total number of elements np.arange(10).reshape(2, 5).size Out[90]: 10
Also, .ndim
returns the overall dimensionality of an array:
In [91]: # .ndim will give you the total # of dimensions np.arange(10).reshape(2,5).ndim Out[91]: 2
There are a number of valuable statistical functions, as well as a number of descriptive statistical functions besides those demonstrated here. This was meant to be a brief overview of NumPy arrays, and the next two chapters on pandas Series
and DataFrame
objects will dive deeper into these additional methods.
13.59.173.242