Expanding on panda data frames in Jupyter

There are more functions built-in for working with data frames than we have used so far. If we were to take one of the data frames from a prior example in this chapter, the Titanic dataset from an Excel file, we could use additional functions to help portray and work with the dataset.

As a repeat, we load the dataset using the script:

import pandas as pd
df = pd.read_excel('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.xls')

We can then inspect the data frame using the info function, which displays the characteristics of the data frame:

df.info()  

Some of the interesting points are as follows:

  • 1309 entries
  • 14 columns
  • Not many fields with valid data in the body column—most were lost
  • Does give a good overview of the types of data involved

We can also use the describe function, which gives us a statistical breakdown of the number columns in the data frame.

df.describe()  

This produces the following tabular display:

For each numerical column we have:

  • Count
  • Mean
  • Standard deviation
  • 25, 50, and 75 percentile points
  • Min, max values for the item

We can slice rows of interest using the syntax df[12:13], where the first number (defaults to first row in data frame) is the first row to slice off and the second number (defaults to the last row in the data frame) is the last row to slice off.

Running this slice operation we get the expected results:

Since we are effectively creating a new data frame when we select columns from a data frame, we can then use the head function against that as well:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.12.207