7.15 Wrap-Up

This chapter explored the use of NumPy’s high-performance ndarrays for storing and retrieving data, and for performing common data manipulations concisely and with reduced chance of errors with functional-style programming. We refer to ndarrays simply by their synonym, arrays.

The chapter examples demonstrated how to create, initialize and refer to individual elements of one- and two-dimensional arrays. We used attributes to determine an array’s size, shape and element type. We showed functions that create arrays of 0s, 1s, specific values or ranges values. We compared list and array performance with the IPython %timeit magic and saw that arrays are up to two orders of magnitude faster.

We used array operators and NumPy universal functions to perform element-wise calculations on every element of arrays that have the same shape. You also saw that NumPy uses broadcasting to perform element-wise operations between arrays and scalar values, and between arrays of different shapes. We introduced various built-in array methods for performing calculations using all elements of an array, and we showed how to perform those calculations row-by-row or column-by-column. We demonstrated various array slicing and indexing capabilities that are more powerful than those provided by Python’s built-in collections. We demonstrated various ways to reshape arrays. We discussed how to shallow copy and deep copy arrays and other Python objects.

In the Intro to Data Science section, we began our multisection introduction to the popular pandas library that you’ll use in many of the data science case study chapters. You learned that many big data applications need more flexible collections than NumPy’s arrays, collections that support mixed data types, custom indexing, missing data, data that’s not structured consistently and data that needs to be manipulated into forms appropriate for the databases and data analysis packages you use.

We showed how to create and manipulate pandas array-like one-dimensional Series and two-dimensional DataFrames. We customized Series and DataFrame indices. You saw pandas’ nicely formatted outputs and customized the precision of floating-point values. We showed various ways to access and select data in Series and DataFrames. We used method describe to calculate basic descriptive statistics for Series and DataFrames. We showed how to transpose DataFrame rows and columns via the T attribute. You saw several ways to sort DataFrames using their index values, their column names, the data in their rows and the data in their columns. You’re now familiar with four powerful array-like collections—lists, arrays, Series and DataFrames—and the contexts in which to use them. We’ll add a fifth—tensors—in the “Deep Learning” chapter.

In the next chapter, we take a deeper look at strings, string formatting and string methods. We also introduce regular expressions, which we’ll use to match patterns in text. The capabilities you’ll learn will help you prepare for the “Natural Language Processing (NLP)” chapter and other key data science chapters. In the next chapter’s Intro to Data Science section, we’ll introduce pandas data munging—preparing data for use in your database or analytics software. In subsequent chapters, we’ll use pandas for basic time-series analysis and introduce pandas visualization capabilities.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.8.42