How it works...

Step 2 counts and then displays the schools with the highest number of missing values. As there are nine columns in the DataFrame, the maximum number of missing values per school is nine. Many schools are missing values for each column. Step 3 removes rows that have all their values missing. The dropna method in step 3 has the how parameter, which is defaulted to the string any but may also be changed to all. When set to any, it drops rows that contain one or more missing values. When set to all, it only drops rows where all values are missing.

In this case, we conservatively drop rows that are missing all values. This is because it's possible that some missing values simply represent 0 percent. This did not happen to be the case here, as there were no missing values after the dropna was performed. If there were still missing values, we could have run the fillna(0) method to fill all the remaining values with 0.

Step 4 begins our diversity metric calculation using the greater than or equal to method, ge. This results in a DataFrame of all booleans, which is summed horizontally by setting axis='columns'.

The value_counts method is used in step 5 to produce a distribution of our diversity metric. It is quite rare for schools to have three races with 15% or more of the undergraduate student population. Step 7 and step 8 find two schools that are the most diverse based on our metric. Although they are diverse, it appears that many of the races are not fully accounted for and are defaulted into the unknown and two or more categories.

Step 9 selects the top five schools from the US News article. It then selects their diversity metric from our newly created Series. It turns out that these schools also score highly with our simple ranking system.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...