How to do it...

  1. Read in the college dataset and select just those columns with undergraduate race percentage information:
>>> college = pd.read_csv('data/college.csv', index_col='INSTNM')
>>> college_ugds = college.filter(like='UGDS_')
>>> college_ugds.head()
  1. Use the idxmax method to get the column name with the highest race percentage for each row:
>>> highest_percentage_race = college_ugds.idxmax(axis='columns')
>>> highest_percentage_race.head()
INSTNM Alabama A & M University UGDS_BLACK University of Alabama at Birmingham UGDS_WHITE Amridge University UGDS_BLACK University of Alabama in Huntsville UGDS_WHITE Alabama State University UGDS_BLACK dtype: object
  1. Use the value_counts method to return the distribution of maximum occurrences:
>>> highest_percentage_race.value_counts(normalize=True)
UGDS_WHITE 0.670352 UGDS_BLACK 0.151586 UGDS_HISP 0.129473 UGDS_UNKN 0.023422 UGDS_ASIAN 0.012074 UGDS_AIAN 0.006110 UGDS_NRA 0.004073 UGDS_NHPI 0.001746 UGDS_2MOR 0.001164 dtype: float64
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.254.61