How to do it...

Read in the college dataset; the columns that begin with UGDS_ represent the percentage of the undergraduate students of a particular race. Use the filter method to select these columns:

>>> college = pd.read_csv('data/college.csv', index_col='INSTNM')
>>> college_ugds_ = college.filter(like='UGDS_')
>>> college_ugds_.head()

Now that the DataFrame contains homogenous column data, operations can be sensibly done both vertically and horizontally. The count method returns the number of non-missing values. By default, its axis parameter is set to 0:

>>> college_ugds_.count()
UGDS_WHITE    6874
UGDS_BLACK    6874
UGDS_HISP     6874
UGDS_ASIAN    6874
UGDS_AIAN     6874
UGDS_NHPI     6874
UGDS_2MOR     6874
UGDS_NRA      6874
UGDS_UNKN     6874

As the axis parameter is almost always set to 0, it is not necessary to do the following, but for purposes of understanding, Step 2 is equivalent to both college_ugds_.count(axis=0) and college_ugds_.count(axis='index').

Changing the axis parameter to 1/columns transposes the operation so that each row of data has a count of its non-missing values:

>>> college_ugds_.count(axis='columns').head()
INSTNM
Alabama A & M University               9
University of Alabama at Birmingham    9
Amridge University                     9
University of Alabama in Huntsville    9
Alabama State University               9

Instead of counting non-missing values, we can sum all the values in each row. Each row of percentages should add up to 1. The sum method may be used to verify this:

>>> college_ugds_.sum(axis='columns').head()
INSTNM
Alabama A & M University               1.0000
University of Alabama at Birmingham    0.9999
Amridge University                     1.0000
University of Alabama in Huntsville    1.0000
Alabama State University               1.0000

To get an idea of the distribution of each column, the median method can be used:

>>> college_ugds_.median(axis='index')
UGDS_WHITE    0.55570
UGDS_BLACK    0.10005
UGDS_HISP     0.07140
UGDS_ASIAN    0.01290
UGDS_AIAN     0.00260
UGDS_NHPI     0.00000
UGDS_2MOR     0.01750
UGDS_NRA      0.00000
UGDS_UNKN     0.01430

Table of Contents for How to do it...

Create new playlist

Sign In

Sign Up

Table of Contents for
How to do it...