Analyzing changes in college admissions

We can look at trends in college admissions acceptance rates over the last few years. For this analysis, I am using the data on https://www.ivywise.com/ivywise-knowledgebase/admission-statistics.

First, we read in our dataset and show the summary points, from head to validate:

df <- read.csv("Documents/acceptance-rates.csv")
summary(df)
head(df)  

We see the summary data for school acceptance rates as follows:

It's interesting to note that the acceptance rate varies so widely, from a low of 5 percent to a high of 41 percent in 2017.

Let us look at the data plots, again, to validate that the data points are correct:

plot(df)  

From the correlation graphics shown, it does not look like we can use the data points from 2007. The graphs show a big divergence between 2007 and the other years, whereas the other three have good correlations.

So, we have 3 consecutive years of data from 25 major US universities. We can convert the data into a time series using a few steps.

First, we create a vector of the average acceptance rates for these colleges over the years 2015-2017. We use the mean function to determine the average across all colleges in our data frame. We have some NA values in our data frame, so we need to tell the mean function to ignore those values (na.rm=TRUE):

myvector <- c(mean(df[["X2015"]],na.rm=TRUE),
    mean(df[["X2016"]],na.rm=TRUE),
    mean(df[["X2017"]],na.rm=TRUE))  

Next, we convert the vector points into a time series. A time series is passed in the vector to use the start and end points, and the frequency of the data points. In our case, the frequency is yearly, so frequency = 1:

ts <- ts(myvector, start=c(2015), end=c(2017), frequency=1)  

Then plot the time series to get a good visual:

plot(ts)  

So, the clear trend is to drop acceptance rates across the board, as we see the initial acceptance rate at .15 dropping steadily to .14 in 2017.

The data looks very good and well-fitting, as data points are lining up in clean lines. We can use this time series to predict the next few years. There are versions of the Holt-Winters algorithm that can predict based on level data, level data plus a trend component, and level data plus a trend component plus a seasonality component. We have a trend, but no seasonality:

# double exponential - models level and trend
fit <- HoltWinters(ts, gamma=FALSE)
fit
Holt-Winters exponential smoothing with trend and without seasonal component.

Call:
HoltWinters(x = ts, gamma = FALSE)

Smoothing parameters:
 alpha: 0.3
 beta : 0.1
 gamma: FALSE

Coefficients:
         [,1]
a  0.14495402
b -0.00415977  

Our coefficients for the exponential smoothing of one-seventh and close to zero mean we aren't aggressively dropping acceptance rates, but they are dropping.

Now that we have a good time series model of the existing data, we can produce a forecast of the next three years and plot it:

install.packages("forecast", repos="http://cran.us.r-project.org")
library(forecast)
forecast(fit, 3) plot(forecast(fit, 3))

The trend is clearly negative, but as mentioned earlier, it is not a dramatic drop-about half a percent a year. We can also look at similar coding from Python that could be used as follows. Import all of the Python packages we will be using:

import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 15, 6  

Read in the college acceptance to a data frame:

data = pd.read_csv('Documents/acceptance-rates.csv')
print (data.head())
                School  2017  2016  2015  2007
0      Amherst College   NaN  0.14  0.14  0.18
1       Boston College  0.32  0.32  0.28  0.27
2     Brown University  0.08  0.09  0.09  0.15
3  Columbia University  0.06  0.06  0.06  0.12
4   Cornell University  0.13  0.14  0.15  0.21  

Remove the School column as Python cannot calculate from strings:

del data['School']
print (data.head())
   2017  2016  2015  2007
0   NaN  0.14  0.14  0.18
1  0.32  0.32  0.28  0.27
2  0.08  0.09  0.09  0.15
3  0.06  0.06  0.06  0.12
4  0.13  0.14  0.15  0.21  

Convert the data to sets by year:

data = data.transpose()
print (data.head())

We see the dataset transposed to our desired shape as follows:

See what the data looks like:

plt.plot(data);  

We see the same slight downtrend for acceptance.

Using Holt-Winters forecasting in Python was problematic as it required transforming the data further. Overall, it is much more complicated to do the same processing that was straightforward in R in the preceding section.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.238.20