How it works...

In order for this recipe to complete properly, we need to first filter for institutions that do not have missing values for UGDS, SATMTMID, and SATVRMID. By default, the dropna method drops rows that have one or more missing values. We must use the subset parameter to limit the columns it looks at for missing values.

In step 2, we define a function that calculates the weighted average for just the SATMTMID column. The weighted average differs from an arithmetic mean in that each value is multiplied by some weight. This quantity is then summed and divided by the sum of the weights. In this case, our weight is the undergraduate student population.

In step 3, we pass this function to the apply method. Our function weighted_math_average gets passed a DataFrame of all the original columns for each group. It returns a single scalar value, the weighted average of SATMTMID. At this point, you might think that this calculation is possible using the agg method. Directly replacing apply with agg does not work as agg returns a value for each of its aggregating columns.

It actually is possible to use agg indirectly by precomputing the multiplication of UGDS and SATMTMID.

Step 6 really shows the versatility of apply. We build a new function that calculates the weighted and arithmetic average of both SAT columns as well as the number of rows for each group. In order for apply to create multiple columns, you must return a Series. The index values are used as column names in the resulting DataFrame. You can return as many values as you want with this method.

Notice that the OrderedDict class was imported from the collections module, which is part of the standard library. This ordered dictionary is used to store the data. A normal Python dictionary could not have been used to store the data since it does not preserve insertion order.

The constructor, pd.Series, does have an index parameter that you can use to specify order but using an OrderedDict is cleaner.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.60.158