9 Pandas

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

9 Pandas

To clarify, *add* data.

Edward R. Tufte

In This Chapter

The Pandas DataFrame, which is built on top of the NumPy array, is probably the most commonly used data structure. DataFrames are like supercharged spreadsheets in code. They are one of the primary tools used in data science. This chapter looks at creating DataFrames, manipulating DataFrames, accessing data in DataFrames, and manipulating that data.

About DataFrames

A Pandas DataFrame, like a spreadsheet, is made up of columns and rows. Each column is a pandas.Series object. A DataFrame is, in some ways, similar to a two-dimensional NumPy array, with labels for the columns and index. Unlike a NumPy array, however, a DataFrame can contain different data types. You can think of a pandas.Series object as a one-dimensional NumPy array with labels. The pandas.Series object, like a NumPy array, can contain only one data type. The pandas.Series object can use many of the same methods you have seen with arrays, such as min(), max(), mean(), and medium().

The usual convention is to import the Pandas package aliased as pd:

import pandas as pd

Creating DataFrames

You can create DataFrames with data from many sources, including dictionaries and lists and, more commonly, by reading files. You can create an empty DataFrame by using the DataFrame constructor:

	first	last	ages
0	shanda	smith	43
1	rolly	brocker	23
2	molly	stein	78
3	frank	bach	56
4	rip	spencer	26
5	steven	de wilde	14
6	gwen	mason	46
7	arthur	davis	92

	0	1	2
0	shanda	smith	43
1	rolly	brocker	23
2	molly	stein	78
3	frank	bach	56
4	rip	spencer	26
5	steven	de wilde	14
6	gwen	mason	46
7	arthur	davis	92

	first	last	ages
a	shanda	smith	43
b	rolly	brocker	23
c	molly	stein	78
d	frank	bach	56
e	rip	spencer	26
f	steven	de wilde	14
g	gwen	mason	46
h	arthur	davis	92

	Major	Major_category	Total	Unemployment_rate
0	GENERAL AGRICULTURE	Agriculture & Natural Resources	128148	0.026147
1	AGRICULTURE PRODUCTION AND MANAGEMENT	Agriculture & Natural Resources	95326	0.028636
2	AGRICULTURAL ECONOMICS	Agriculture & Natural Resources	33955	0.030248
...	...	...	...	...
170	MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION	Business	102753	0.052679
171	HISTORY	Humanities & Liberal Arts	712509	0.065851
172	UNITED STATES HISTORY	Humanities & Liberal Arts	17746	0.073500

Table of Contents for 9 Pandas

Create new playlist

Sign In

Sign Up

9

Pandas

About DataFrames

Creating DataFrames

Creating a DataFrame from a Dictionary

Creating a DataFrame from a List of Lists

Creating a DataFrame from a File

Interacting with DataFrame Data

Heads and Tails

Descriptive Statistics

Accessing Data

Bracket Syntax

Optimized Access by Label

Optimized Access by Index

Masking and Filtering

Pandas Boolean Operators

Manipulating DataFrames

Manipulating Data

The replace Method

Interactive Display

Summary

Questions

Table of Contents for
9 Pandas

The `replace` Method