This chapter explored the use of NumPy’s high-performance ndarray
s for storing and retrieving data, and for performing common data manipulations concisely and with reduced chance of errors with functional-style programming. We refer to ndarray
s simply by their synonym, array
s.
The chapter examples demonstrated how to create, initialize and refer to individual elements of one- and two-dimensional array
s. We used attributes to determine an array
’s size, shape and element type. We showed functions that create array
s of 0s, 1s, specific values or ranges values. We compared list and array
performance with the IPython %timeit
magic and saw that array
s are up to two orders of magnitude faster.
We used array
operators and NumPy universal functions to perform element-wise calculations on every element of array
s that have the same shape. You also saw that NumPy uses broadcasting to perform element-wise operations between array
s and scalar values, and between array
s of different shapes. We introduced various built-in array
methods for performing calculations using all elements of an array
, and we showed how to perform those calculations row-by-row or column-by-column. We demonstrated various array
slicing and indexing capabilities that are more powerful than those provided by Python’s built-in collections. We demonstrated various ways to reshape array
s. We discussed how to shallow copy and deep copy array
s and other Python objects.
In the Intro to Data Science section, we began our multisection introduction to the popular pandas library that you’ll use in many of the data science case study chapters. You learned that many big data applications need more flexible collections than NumPy’s array
s, collections that support mixed data types, custom indexing, missing data, data that’s not structured consistently and data that needs to be manipulated into forms appropriate for the databases and data analysis packages you use.
We showed how to create and manipulate pandas array-like one-dimensional Series
and two-dimensional DataFrames
. We customized Series
and DataFrame
indices. You saw pandas’ nicely formatted outputs and customized the precision of floating-point values. We showed various ways to access and select data in Series
and DataFrame
s. We used method describe
to calculate basic descriptive statistics for Series
and DataFrame
s. We showed how to transpose DataFrame
rows and columns via the T
attribute. You saw several ways to sort DataFrame
s using their index values, their column names, the data in their rows and the data in their columns. You’re now familiar with four powerful array-like collections—lists, array
s, Series
and DataFrame
s—and the contexts in which to use them. We’ll add a fifth—tensors—in the “Deep Learning” chapter.
In the next chapter, we take a deeper look at strings, string formatting and string methods. We also introduce regular expressions, which we’ll use to match patterns in text. The capabilities you’ll learn will help you prepare for the “Natural Language Processing (NLP)” chapter and other key data science chapters. In the next chapter’s Intro to Data Science section, we’ll introduce pandas data munging—preparing data for use in your database or analytics software. In subsequent chapters, we’ll use pandas for basic time-series analysis and introduce pandas visualization capabilities.
3.145.8.42