How to do it...

  1. Read in the college dataset; the columns that begin with UGDS_ represent the percentage of the undergraduate students of a particular race. Use the filter method to select these columns:
>>> college = pd.read_csv('data/college.csv', index_col='INSTNM')
>>> college_ugds_ = college.filter(like='UGDS_')
>>> college_ugds_.head()
  1. Now that the DataFrame contains homogenous column data, operations can be sensibly done both vertically and horizontally. The count method returns the number of non-missing values. By default, its axis parameter is set to 0:
>>> college_ugds_.count()
UGDS_WHITE 6874 UGDS_BLACK 6874 UGDS_HISP 6874 UGDS_ASIAN 6874 UGDS_AIAN 6874 UGDS_NHPI 6874 UGDS_2MOR 6874 UGDS_NRA 6874 UGDS_UNKN 6874
As the axis parameter is almost always set to 0, it is not necessary to do the following, but for purposes of understanding, Step 2 is equivalent to both college_ugds_.count(axis=0) and college_ugds_.count(axis='index').
  1. Changing the axis parameter to 1/columns transposes the operation so that each row of data has a count of its non-missing values:
>>> college_ugds_.count(axis='columns').head()
INSTNM
Alabama A & M University 9 University of Alabama at Birmingham 9 Amridge University 9 University of Alabama in Huntsville 9 Alabama State University 9
  1. Instead of counting non-missing values, we can sum all the values in each row. Each row of percentages should add up to 1. The sum method may be used to verify this:
>>> college_ugds_.sum(axis='columns').head()
INSTNM Alabama A & M University 1.0000 University of Alabama at Birmingham 0.9999 Amridge University 1.0000 University of Alabama in Huntsville 1.0000 Alabama State University 1.0000
  1. To get an idea of the distribution of each column, the median method can be used:
>>> college_ugds_.median(axis='index')
UGDS_WHITE 0.55570 UGDS_BLACK 0.10005 UGDS_HISP 0.07140 UGDS_ASIAN 0.01290 UGDS_AIAN 0.00260 UGDS_NHPI 0.00000 UGDS_2MOR 0.01750 UGDS_NRA 0.00000 UGDS_UNKN 0.01430
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.32.67