We will perform a graphical comparison of the closing values for AAPL
and MSFT
. Using the closing prices DataFrame
, it is simple to plot the values for a specific stock using the .plot()
method of Series
. The following plots the adjusted closing price for AAPL
:
In [9]: # plot the closing prices of AAPL close_px['AAPL'].plot();
The following code plots the adjusted closing price for MSFT
:
In [10]: # plot the closing prices of MSFT close_px['MSFT'].plot();
Both sets of closing values can easily be displayed on a single chart in order to give a side-by-side comparison:
In [11]: # plot MSFT vs AAPL on the same chart close_px[['MSFT', 'AAPL']].plot();
The output is seen in the following screenshot:
Volume data can be plotted using bar charts. We first need to get the volume data, which can be done using the pivotTickersToColumns()
function created earlier:
In [12]: # pivot the volume data into columns volumes = pivotTickersToColumns(raw, "Volume") volumes.tail() Out [12]: Ticker AA AAPL DAL ... MSFT PEP UAL Date ... 2014-12-24 4944200 14479600 4296200 ... 11437800 1604100 2714300 2014-12-26 6355200 33721000 5303100 ... 13197800 1492700 3062200 2014-12-29 7087800 27598900 6656700 ... 14439500 2453800 2874300 2014-12-30 9262100 29881500 7318900 ... 16384700 2134400 2644600 2014-12-31 8842400 41403400 7801700 ... 21552500 3727400 4451200 [5 rows x 9 columns]
We can now use this DataFrame
to plot a bar chart. The following plots the volume for MSFT
:
In [13]: # plot the volume for MSFT msftV = volumes[["MSFT"]] plt.bar(msftV.index, msftV["MSFT"]) plt.gcf().set_size_inches(15,8)
The output is seen in the following screenshot:
A common type of financial graph plots a stock volume relative to its closing price:
In [14]: # draw the price history on the top top = plt.subplot2grid((4,4), (0, 0), rowspan=3, colspan=4) top.plot(close_px['MSFT'].index, close_px['MSFT'], label='MSFT Adjusted Close') plt.title('Microsoft Adjusted Close Price 2012 - 2014') plt.legend(loc=2) # and the volume along the bottom bottom = plt.subplot2grid((4,4), (3,0), rowspan=1, colspan=4) bottom.bar(msftV.index, msftV['MSFT']) plt.title('Microsoft Trading Volume 2012 - 2014') plt.subplots_adjust(hspace=0.75) plt.gcf().set_size_inches(15,8)
The output is seen in the following screenshot:
The simple daily percentage change (without dividends and other factors) is the percentage change in value of a stock over a single day of trading. It is defined by the following formula:
This can be easily calculated in pandas using .shift()
:
In [15]: # calculate daily percentage change daily_pc = close_px / close_px.shift(1) - 1 daily_pc[:5] Out [15]: Ticker AA AAPL DAL ... MSFT PEP UAL Date ... 2012-01-03 NaN NaN NaN ... NaN NaN NaN 2012-01-04 0.024 0.005 -0.004 ... 0.024 0.005 -0.020 2012-01-05 -0.009 0.011 0.039 ... 0.010 -0.008 -0.007 2012-01-06 -0.021 0.010 -0.001 ... 0.015 -0.013 -0.010 2012-01-09 0.029 -0.002 -0.005 ... -0.013 0.005 -0.015 [5 rows x 9 columns]
A quick check shows you that the return for AAPL
on 2011-09-08
is correct:
In [16]: # check the percentage on 2012-01-05 close_px.ix['2012-01-05']['AAPL'] / close_px.ix['2012-01-04'] ['AAPL'] -1 Out [16]: 0.011129061209836699
Plotting the daily percentage change yields the following plot:
In [17]: # plot daily percentage change for AAPL daily_pc["AAPL"].plot();
The output is seen in the following screenshot:
A plot of daily percentage change will tend to look like noise, as shown in the preceding chart. However, when we use the cumulative product of these values, known as the daily cumulative return, then we can see how the value changes over time.
We can calculate the cumulative daily return by taking the cumulative product of the daily percentage change. This calculation is represented by the following equation:
This is actually calculated very succinctly using the following code, which utilizes the .cumprod()
method:
In [18]: # calculate daily cumulative return daily_cr = (1 + daily_pc).cumprod() daily_cr[:5] Out [18]: Ticker AA AAPL DAL ... MSFT PEP UAL Date ... 2012-01-03 NaN NaN NaN ... NaN NaN NaN 2012-01-04 1.024 1.005 0.996 ... 1.024 1.005 0.980 2012-01-05 1.015 1.017 1.035 ... 1.034 0.997 0.973 2012-01-06 0.993 1.027 1.034 ... 1.050 0.985 0.963 2012-01-09 1.022 1.026 1.029 ... 1.036 0.990 0.949 [5 rows x 9 columns]
We can plot the cumulative returns to see how the different stocks compare:
In [19]: # plot all the cumulative returns to get an idea # of the relative performance of all the stocks daily_cr.plot(figsize=(8,6)) plt.legend(loc=2);
The output is seen in the following screenshot:
To calculate the monthly rate of return, we can use a little pandas magic and resample the original daily dates to throw out the days that are not an end of month value as well as forward filling anywhere there are missing values. This can be done using the ffill
method to replace any NaN
values from missing data with the previous value:
In [20]: # resample to end of month and forward fill values monthly = close_px.asfreq('EOM', method="ffill") monthly[:5] Out [20]: Ticker AA AAPL DAL ... MSFT PEP UAL 2012-01-31 9.79 61.51 10.39 ... 26.94 59.78 23.10 2012-02-29 9.83 73.09 9.66 ... 29.15 57.76 20.65 2012-03-30 9.68 80.79 9.77 ... 29.62 60.89 21.50 2012-04-30 9.40 78.69 10.79 ... 29.40 60.57 21.92 2012-05-31 8.29 77.85 11.92 ... 26.98 62.76 25.17 [5 rows x 9 columns]
Note the date of the entries and how they are now only month-end dates. Values have not changed as the resample only selects the dates at the end of the month or fills the value with the value prior to that date if it did not exist in the source.
Now we can use this to calculate the monthly percentage changes:
In [21]: # calculate the monthly percentage changes monthly_pc = monthly / monthly.shift(1) - 1 monthly_pc[:5] Out [21]: Ticker AA AAPL DAL ... MSFT PEP UAL 2012-01-31 NaN NaN NaN ... NaN NaN NaN 2012-02-29 0.004 0.188 -0.070 ... 0.082 -0.034 -0.106 2012-03-30 -0.015 0.105 0.011 ... 0.016 0.054 0.041 2012-04-30 -0.029 -0.026 0.104 ... -0.007 -0.005 0.020 2012-05-31 -0.118 -0.011 0.105 ... -0.082 0.036 0.148 [5 rows x 9 columns]
From that result, calculate the monthly cumulative returns and plot the results:
In [22]: # calculate monthly cumulative return monthly_cr = (1 + monthly_pc).cumprod() monthly_cr[:5] Out [22]: Ticker AA AAPL DAL ... MSFT PEP UAL 2012-01-31 NaN NaN NaN ... NaN NaN NaN 2012-02-29 1.004 1.188 0.930 ... 1.082 0.966 0.894 2012-03-30 0.989 1.313 0.940 ... 1.099 1.019 0.931 2012-04-30 0.960 1.279 1.038 ... 1.091 1.013 0.949 2012-05-31 0.847 1.266 1.147 ... 1.001 1.050 1.090 [5 rows x 9 columns] In [23]: # plot the monthly cumulative returns monthly_cr.plot(figsize=(12,6)) plt.legend(loc=2);
The output is seen in the following screenshot:
This looks very similar to the daily returns, but overall, it is not as smooth. This is because it uses roughly a 30th of the data and is tied to the end of month.
You can get a feel for the difference in distribution of the daily percentage changes for a particular stock by plotting that data in a histogram. A trick with generating histograms for data such as daily returns is to select the number of bins to lump values into. We will use 50 bins, which gives you a good feel for the distribution of daily changes across three years of data.
To demonstrate, the following shows you the distribution of the daily percentage change for AAPL
:
In [24]: # histogram of the daily percentage change for AAPL aapl = daily_pc['AAPL'] aapl.hist(bins=50);
The output is seen in the following screenshot:
This chart tells us several things. First, most of the daily movements center around 0.0, and there is a small amount of skew to the left, but the data appears fairly symmetric. If we use the .describe()
method on this data, we will very quickly get some useful analysis to describe the histogram:
In [25]: # descriptive statistics of the percentage changes aapl.describe() Out [25]: count 753.000 mean 0.001 std 0.017 min -0.124 25% -0.007 50% 0.001 75% 0.011 max 0.089 Name: AAPL, dtype: float64
A valuable conclusion to be drawn from this is the 95 percent coverage interval, which varies from -0.007476
to 0.010893
. This states that over this 3-year period, the daily percent of change fit between these two percentages 95 percent of the time. This gives us a sense of the overall volatility in the stock. Technically, the closer these values, the less volatile the stock over that time.
We can plot the histograms of the daily percentage change for all the stocks in a single histogram matrix plot. This gives us a means to quickly determine the differences in stock behavior over these 3 years:
In [26]: # matrix of all stocks daily percentage changes histograms daily_pc.hist(bins=50, figsize=(8,6));
The output is seen in the following screenshot:
From this chart, we can very easily see the difference in performance of these nine stocks during this time, particularly, the skewedness (more exceptional values on one side of the mean) as well as easily being able to easily see the difference in the overall distribution at various confidence levels, thereby giving a quick view of which stocks have been more or less volatile.
This is an excellent demonstration of the value of a picture versus using raw data items. Exploratory analysis like this is made very simple by the pandas ecosystem and allows you to more agilely analyze information.
18.117.230.81