Performing a moving-average calculation

The moving average of a stock can be calculated using the pandas statistical package that is a part of pandas and is in the pd.stats namespace, specifically, the .rolling_mean() function.

The moving average will give you a sense of the performance of a stock over a given time period by eliminating "noise" in the performance of the stock. The larger the moving window, the smoother and less random the graph will be—at the expense of accuracy.

To demonstrate, the following calculates the moving average for MSFT on 30 and 90 day periods using the daily close. The difference in the reduction of noise can be easily determined from the visual:

In [27]:
   # extract just MSFT close
   msft_close = close_px[['MSFT']]['MSFT']
   # calculate the 30 and 90 day rolling means
   ma_30 = pd.stats.moments.rolling_mean(msft_close, 30)
   ma_90 = pd.stats.moments.rolling_mean(msft_close, 90)
   # compose into a DataFrame that can be plotted
   result = pd.DataFrame({'Close': msft_close, 
                          '30_MA_Close': ma_30,
                          '90_MA_Close': ma_90})
   # plot all the series against each other
   result.plot(title="MSFT Close Price")
   plt.gcf().set_size_inches(12,8)

The output is seen in the following screenshot:

Performing a moving-average calculation

The comparison of average daily returns across stocks

A scatter plot is a very effective means of being able to visually determine the relationship between the rates of change in stock prices between two stocks. The following graphs the relationship of the daily percentage change in the closing price between MSFT and AAPL:

In [28]:
   # plot the daily percentage change of MSFT versus AAPL
   plt.scatter(daily_pc['MSFT'], daily_pc['AAPL'])
   plt.xlabel('MSFT')
   plt.ylabel('AAPL'),

The output is seen in the following screenshot:

The comparison of average daily returns across stocks

What this gives us is a very quick view of the overall correlation of the daily returns between the two stocks. Each dot represents a single day for both stocks. Each dot is plotted along the vertical based on the percentage change for AAPL and along the horizontal for MSFT.

If for every amount that AAPL changed in value, MSFT also changed an identically proportional amount each day, then all the dots would fall along a perfect vertical diagonal from the lower-left to upper-right section. In this case, the two variables would be perfectly correlated with a correlation value of 1.0. If the two variables were perfectly uncorrelated, the correlation and hence the slope of the line would be 0, which is perfectly horizontal.

To demonstrate what a perfect correlation would look like, we can plot MSFT versus MSFT. Any such series when correlated with itself will always be 1.0:

In [29]:
   # demonstrate perfect correlation
   plt.scatter(daily_pc['MSFT'], daily_pc['MSFT']);

The output is seen in the following screenshot:

The comparison of average daily returns across stocks

Getting back to the plot of AAPL versus MSFT, excluding several outliers, this cluster appears to demonstrate a moderate correlation between the two stocks.

An actual regression actually shows the correlation to be 0.213 (the slope of the regression line). The regression line would be more toward horizontal than diagonal. This means that for any specific change in the price of AAPL, statistically, we would, more times than not, not be able to predict the change in price of MSFT on the given day from the price change in AAPL.

To facilitate the bulk analysis of multiple correlations, pandas provides the very useful scatter matrix graph, which will plot the scatters for all combinations of stocks. This plot gives a very easy means of eyeballing correlations between all of the combinations:

In [30]:
   # plot the scatter of daily price changes for ALL stocks
   pd.scatter_matrix(daily_pc, diagonal='kde', figsize=(12,12));

The output is seen in the following screenshot:

The comparison of average daily returns across stocks

The diagonal in this plot is a kernel density estimation graph. If you refer to the section on using histograms to show the distribution of daily percentage changes for a single stock, this plot is essentially the same information, giving you a quick overview of how volatile the different stocks are relative to each other. The narrower curves are less volatile than those that are wider, with the skew representing a tendency for greater returns or losses.

The correlation of stocks based on the daily percentage change of the closing price

The previous section mentioned briefly the concept of correlation. Correlation is a measure of the strength of the association between two variables. A correlation coefficient of 1.0 means that every change in value in one set of data has a proportionate change in value to the other set of data. A 0.0 correlation means that the data sets have no relationship. The higher the correlation, the more ability there is to predict a change in the other based on a change in the first.

The correlation between columns of data in DataFrame can be calculated very easily by simply calling its .corr() method. This will produce a matrix of all possible correlations between the variables represented by the values in all columns. To demonstrate, the following calculates the correlation in the daily percentage change in the close price for all of these stocks over the 3 years of the sample:

In [31]:
   # calculate the correlation between all the stocks relative
   # to daily percentage change
   corrs = daily_pc.corr()
   corrs

Out [31]:
   Ticker     AA   AAPL    DAL  ...     MSFT    PEP    UAL
   Ticker                       ...                       
   AA      1.000  0.236  0.251  ...    0.310  0.227  0.223
   AAPL    0.236  1.000  0.135  ...    0.187  0.092  0.062
   DAL     0.251  0.135  1.000  ...    0.149  0.174  0.761
   GE      0.458  0.239  0.317  ...    0.341  0.381  0.237
   IBM     0.311  0.212  0.168  ...    0.356  0.258  0.124
   KO      0.228  0.161  0.187  ...    0.271  0.557  0.139
   MSFT    0.310  0.187  0.149  ...    1.000  0.284  0.127
   PEP     0.227  0.092  0.174  ...    0.284  1.000  0.130
   UAL     0.223  0.062  0.761  ...    0.127  0.130  1.000

   [9 rows x 9 columns]

The diagonal is 1.0, as a series is always perfectly correlated with itself. This correlation matrix can be visualized using a heat map with the following code:

In [32]:
   # plot a heatmap of the correlations
   plt.imshow(corrs, cmap='hot', interpolation='none')
   plt.colorbar()
   plt.xticks(range(len(corrs)), corrs.columns)
   plt.yticks(range(len(corrs)), corrs.columns)
   plt.gcf().set_size_inches(8,8)

The output is seen in the following screenshot:

The correlation of stocks based on the daily percentage change of the closing price

The idea with this diagram is that you can see the level of correlation via color by finding the intersection of vertical and horizontal variables. The darker the color, the less the correlation; the lighter the color, the greater the correlation. The diagonal is necessarily white (1.0), as it is each stock compared to itself.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.43.26