Chapter 4. The pandas Series Object

pandas is a high-performance library that provides a comprehensive set of data structures for manipulating tabular data, providing high-performance indexing, automatic alignment, reshaping, grouping, joining, and statistical analyses capabilities.

The two primary data structures in pandas are the Series and the DataFrame objects. In this chapter, we will examine the Series object and how it builds on the features of a NumPy ndarray to provide operations such as indexing, axis labeling, alignment, handling of missing data, and merging across multiple series of data.

In this chapter, we will cover the following topics:

  • Creating and initializing a Series and its index
  • Determining the shape of a Series object
  • Heads, tails, uniqueness, and counts of values
  • Looking up values in a Series object
  • Boolean selection
  • Alignment via index labels
  • Arithmetic operations on a Series object
  • Reindexing a Series object
  • Applying arithmetic operations on Series objects
  • The special case of Not-A-Number (NaN)
  • Slicing Series objects

The Series object

The Series is the primary building block of pandas. A Series represents a one-dimensional labeled indexed array based on the NumPy ndarray. Like an array, a Series can hold zero or more values of any single data type.

A pandas Series deviates from NumPy arrays by adding an associated set of labels that are used to index and efficiently access the elements of the array by the label values instead of just by the integer position. This labeled index is a key feature of pandas Series (and, as we will see, also a DataFrame) and adds significant power for accessing the elements of the Series over a NumPy array.

A Series always has an index even if one is not specified. In this default case, pandas will create an index that consists of sequential integers starting from zero. This default behavior will make a Series initially appear to be very similar to a NumPy array. This is by design, as a Series was derived from a NumPy array. This allowed a Series to be used by existing NumPy array code that used integer-based position lookup. In recent versions of pandas, this derivation from ndarray has been removed, but the Series still remains mostly API compatible.

Even though a Series with a default integer index will appear identical to a NumPy array, access to elements is not by integer position but using values in the index (referred to as labels). The pandas library will use the provided labels to perform a lookup of values for those labels. Unlike an array, index labels do not need to be integers, they can have repeated labels, can have hierarchical sets of labels, and are integrally utilized in a pandas concept, known as automatic alignment of values by index label.

This automatic alignment is arguably the most significant change that a Series makes over ndarray. Operations applied across multiple pandas objects (a simple example can be addition) are not blindly applied to the values in order by position in the Series. The pandas library will first align the two pandas objects by the index labels and then apply the operation values with aligned labels. This is in a way, a simple type of join and allows you to associate data with common index labels without any effort.

A pandas index is a first-class component of pandas. pandas provides various specializations of indexes for different data types with each being highly optimized for that specific type of data, be it integers, floats, strings, datetime objects, or any type of hashable pandas object. Additionally, a Series can be reindexed into other types of indexes, effectively providing different views into the Series object using different indexes.

This ability to dynamically construct alternative views on data using ad hoc indexes establishes an environment for interactive data manipulation, where data can stay in a single structure but can be easily morphed into different views. This facilitates creating a very interactive environment to play with information and intuitively discovering meaning without having to be overburdened by its structure, such as with relational tools such as SQL.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.26.22