The moving average of a stock can be calculated using the pandas statistical package that is a part of pandas and is in the pd.stats
namespace, specifically, the .rolling_mean()
function.
The moving average will give you a sense of the performance of a stock over a given time period by eliminating "noise" in the performance of the stock. The larger the moving window, the smoother and less random the graph will be—at the expense of accuracy.
To demonstrate, the following calculates the moving average for MSFT
on 30
and 90
day periods using the daily close. The difference in the reduction of noise can be easily determined from the visual:
In [27]: # extract just MSFT close msft_close = close_px[['MSFT']]['MSFT'] # calculate the 30 and 90 day rolling means ma_30 = pd.stats.moments.rolling_mean(msft_close, 30) ma_90 = pd.stats.moments.rolling_mean(msft_close, 90) # compose into a DataFrame that can be plotted result = pd.DataFrame({'Close': msft_close, '30_MA_Close': ma_30, '90_MA_Close': ma_90}) # plot all the series against each other result.plot(title="MSFT Close Price") plt.gcf().set_size_inches(12,8)
The output is seen in the following screenshot:
A scatter plot is a very effective means of being able to visually determine the relationship between the rates of change in stock prices between two stocks. The following graphs the relationship of the daily percentage change in the closing price between MSFT
and AAPL
:
In [28]: # plot the daily percentage change of MSFT versus AAPL plt.scatter(daily_pc['MSFT'], daily_pc['AAPL']) plt.xlabel('MSFT') plt.ylabel('AAPL'),
The output is seen in the following screenshot:
What this gives us is a very quick view of the overall correlation of the daily returns between the two stocks. Each dot represents a single day for both stocks. Each dot is plotted along the vertical based on the percentage change for AAPL
and along the horizontal for MSFT
.
If for every amount that AAPL
changed in value, MSFT
also changed an identically proportional amount each day, then all the dots would fall along a perfect vertical diagonal from the lower-left to upper-right section. In this case, the two variables would be perfectly correlated with a correlation value of 1.0
. If the two variables were perfectly uncorrelated, the correlation and hence the slope of the line would be 0
, which is perfectly horizontal.
To demonstrate what a perfect correlation would look like, we can plot MSFT
versus MSFT
. Any such series when correlated with itself will always be 1.0
:
In [29]: # demonstrate perfect correlation plt.scatter(daily_pc['MSFT'], daily_pc['MSFT']);
The output is seen in the following screenshot:
Getting back to the plot of AAPL
versus MSFT
, excluding several outliers, this cluster appears to demonstrate a moderate correlation between the two stocks.
An actual regression actually shows the correlation to be 0.213 (the slope of the regression line). The regression line would be more toward horizontal than diagonal. This means that for any specific change in the price of AAPL
, statistically, we would, more times than not, not be able to predict the change in price of MSFT
on the given day from the price change in AAPL
.
To facilitate the bulk analysis of multiple correlations, pandas provides the very useful scatter matrix graph, which will plot the scatters for all combinations of stocks. This plot gives a very easy means of eyeballing correlations between all of the combinations:
In [30]: # plot the scatter of daily price changes for ALL stocks pd.scatter_matrix(daily_pc, diagonal='kde', figsize=(12,12));
The output is seen in the following screenshot:
The diagonal in this plot is a kernel density estimation graph. If you refer to the section on using histograms to show the distribution of daily percentage changes for a single stock, this plot is essentially the same information, giving you a quick overview of how volatile the different stocks are relative to each other. The narrower curves are less volatile than those that are wider, with the skew representing a tendency for greater returns or losses.
The previous section mentioned briefly the concept of correlation. Correlation is a measure of the strength of the association between two variables. A correlation coefficient of 1.0 means that every change in value in one set of data has a proportionate change in value to the other set of data. A 0.0 correlation means that the data sets have no relationship. The higher the correlation, the more ability there is to predict a change in the other based on a change in the first.
The correlation between columns of data in DataFrame
can be calculated very easily by simply calling its .corr()
method. This will produce a matrix of all possible correlations between the variables represented by the values in all columns. To demonstrate, the following calculates the correlation in the daily percentage change in the close price for all of these stocks over the 3 years of the sample:
In [31]: # calculate the correlation between all the stocks relative # to daily percentage change corrs = daily_pc.corr() corrs Out [31]: Ticker AA AAPL DAL ... MSFT PEP UAL Ticker ... AA 1.000 0.236 0.251 ... 0.310 0.227 0.223 AAPL 0.236 1.000 0.135 ... 0.187 0.092 0.062 DAL 0.251 0.135 1.000 ... 0.149 0.174 0.761 GE 0.458 0.239 0.317 ... 0.341 0.381 0.237 IBM 0.311 0.212 0.168 ... 0.356 0.258 0.124 KO 0.228 0.161 0.187 ... 0.271 0.557 0.139 MSFT 0.310 0.187 0.149 ... 1.000 0.284 0.127 PEP 0.227 0.092 0.174 ... 0.284 1.000 0.130 UAL 0.223 0.062 0.761 ... 0.127 0.130 1.000 [9 rows x 9 columns]
The diagonal is 1.0, as a series is always perfectly correlated with itself. This correlation matrix can be visualized using a heat map with the following code:
In [32]: # plot a heatmap of the correlations plt.imshow(corrs, cmap='hot', interpolation='none') plt.colorbar() plt.xticks(range(len(corrs)), corrs.columns) plt.yticks(range(len(corrs)), corrs.columns) plt.gcf().set_size_inches(8,8)
The output is seen in the following screenshot:
The idea with this diagram is that you can see the level of correlation via color by finding the intersection of vertical and horizontal variables. The darker the color, the less the correlation; the lighter the color, the greater the correlation. The diagonal is necessarily white (1.0), as it is each stock compared to itself.
3.21.43.26