Expanding on panda data frames in Jupyter

There are more functions built-in for working with data frames than we have used so far. If we were to take one of the data frames from a prior example in this chapter, the Titanic dataset from an Excel file, we could use additional functions to help portray and work with the dataset.

As a repeat, we load the dataset using the script:

import pandas as pd
df = pd.read_excel('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls')

We can then inspect the data frame using the info function, which displays the characteristics of the data frame:

df.info()

Some of the interesting points are as follows:

1309 entries
14 columns
Not many fields with valid data in the body column—most were lost
Does give a good overview of the types of data involved

We can also use the describe function, which gives us a statistical breakdown of the number columns in the data frame.

df.describe()

This produces the following tabular display:

For each numerical column we have:

Count
Mean
Standard deviation
25, 50, and 75 percentile points
Min, max values for the item

We can slice rows of interest using the syntax df[12:13], where the first number (defaults to first row in data frame) is the first row to slice off and the second number (defaults to the last row in the data frame) is the last row to slice off.

Running this slice operation we get the expected results:

Since we are effectively creating a new data frame when we select columns from a data frame, we can then use the head function against that as well:

Table of Contents for Expanding on panda data frames in Jupyter

Create new playlist

Sign In

Sign Up

Table of Contents for
Expanding on panda data frames in Jupyter