9.4 Principal Component Analysis

An important topic in multivariate time series analysis is the study of the covariance (or correlation) structure of the series. For example, the covariance structure of a vector return series plays an important role in portfolio selection. In what follows, we discuss some statistical methods useful in studying the covariance structure of a vector time series.

Given a k-dimensional random variable inline with covariance matrix inline, a principal component analysis (PCA) is concerned with using a few linear combinations of ri to explain the structure of inline. If inline denotes the monthly log returns of k assets, then PCA can be used to study the main source of variations of these k asset returns. Here the keyword is few so that simplification can be achieved in multivariate analysis.

9.4.1 Theory of PCA

Principal component analysis applies to either the covariance matrix inline or the correlation matrix inline of inline. Since the correlation matrix is the covariance matrix of the standardized random vector inline, where inline is the diagonal matrix of standard deviations of the components of inline, we use covariance matrix in our theoretical discussion. Let inline be a k-dimensional real-valued vector, where i = 1, … , k. Then

inline

is a linear combination of the random vector inline. If inline consists of the simple returns of k stocks, then yi is the return of a portfolio that assigns weight wij to the jth stock. Since multiplying a constant to inline does not affect the proportion of allocation assigned to the jth stock, we standardize the vector inline so that inline=inline.

Using properties of a linear combination of random variables, we have

9.11 9.11

9.12 9.12

The idea of PCA is to find linear combinations inline such that yi and yj are uncorrelated for ij and the variances of yi are as large as possible. More specifically:

1. The first principal component of inline is the linear combination inline that maximizes Var(y1) subject to the constraint inline = 1.

2. The second principal component of inline is the linear combination inline that maximizes Var(y2) subject to the constraints inline = 1 and Cov(y2, y1) = 0.

3. The ith principal component of inline is the linear combination inline that maximizes Var(yi) subject to the constraints inline = 1 and Cov(yi, yj) = 0 for j = 1, … , i − 1.

Since the covariance matrix inline is nonnegative definite, it has a spectral decomposition; see Appendix A of Chapter 8. Let inline, …, inline be the eigenvalue–eigenvector pairs of inline, where λ1 ≥ λ2 ≥ ⋯ ≥ λk ≥ 0 and inline, which is properly normalized. We have the following statistical result.

Result 9.1

The ith principal component of inline is inline = inline for i = 1, … , k. Moreover,

inline

If some eigenvalues λi are equal, the choices of the corresponding eigenvectors inline and hence yi are not unique. In addition, we have

9.13 9.13

The result of Eq. (9.13) says that

inline

Consequently, the proportion of total variance in inline explained by the ith principal component is simply the ratio between the ith eigenvalue and the sum of all eigenvalues of inline. One can also compute the cumulative proportion of total variance explained by the first i principal components [i.e., inline]. In practice, one selects a small i such that the resulting cumulative proportion is large.

Since inline = k, the proportion of variance explained by the ith principal component becomes λi/k when the correlation matrix is used to perform the PCA.

A by-product of the PCA is that a zero eigenvalue of inline, or inline, indicates the existence of an exact linear relationship between the components of inline. For instance, if the smallest eigenvalue λk = 0, then by Result 9.1 Var(yk) = 0. Therefore, yk = inline is a constant and there are only k − 1 random quantities in inline. In this case, the dimension of inline can be reduced. For this reason, PCA has been used in the literature as a tool for dimension reduction.

9.4.2 Empirical PCA

In application, the covariance matrix inline and the correlation matrix inline of the return vector inline are unknown, but they can be estimated consistently by the sample covariance and correlation matrices under some regularity conditions. Assuming that the returns are weakly stationary and the data consist of inline, we have the following estimates:

9.14 9.14

9.15 9.15

where inline = diaginline is the diagonal matrix of sample standard errors of inline. Methods to compute eigenvalues and eigenvectors of a symmetric matrix can then be used to perform the PCA. Most statistical packages now have the capability to perform principal component analysis. In R and S-Plus, the basic command of PCA is princomp, and in FinMetrics the command is mfactor.

Example 9.1

Consider the monthly log stock returns of International Business Machines, Hewlett-Packard, Intel Corporation, J.P. Morgan Chase, and Bank of America from January 1990 to December 2008. The returns are in percentages and include dividends. The data set has 228 observations. Figure 9.4 shows the time plots of these five monthly return series. As expected, returns of companies in the same industrial sector tend to exhibit similar patterns.

Figure 9.4 Time plots of monthly log stock returns in percentages and including dividends for (a) International Business Machines, (b) Hewlett-Packard, (c) Intel, (d) J.P. Morgan Chase, and (e) Bank of America from January 1990 to December 2008.

9.4

Denote the returns by inline = (IBM, HPQ, INTC, JPM, BAC). The sample mean vector of the returns is (0.70, 0.99, 1.20, 0.82, 0.41) and the sample covariance and correlation matrices are

inline

Table 9.3 gives the results of PCA using both the covariance and correlation matrices. Also given are eigenvalues, eigenvectors, and proportions of variabilities explained by the principal components. Consider the correlation matrix and denote the sample eigenvalues and eigenvectors by inline and inline. We have

inline

for the first two principal components. These two components explain about 74% of the total variability of the data, and they have interesting interpretations. The first component is a roughly equally weighted linear combination of the stock returns. This component might represent the general movement of the stock market and hence is a market component. The second component represents the difference between the two industrial sectors—namely, technologies versus financial services. It might be an industrial component. Similar interpretations of principal components can also be found by using the covariance matrix of inline.

Table 9.3 Results of Principal Component Analysis for Monthly Log Returns, Including Dividends of Stocks of IBM, Hewlett-Packard, Intel, J.P. Morgan Chase, and Bank of America from January 1990 to December 2008a

NumberTable

aThe eigenvectors are in columns.

An informal but useful procedure to determine the number of principal components needed in an application is to examine the scree plot, which is the time plot of the eigenvalues inline ordered from the largest to the smallest (i.e., a plot of inline versus i). Figure 9.5(a) shows the scree plot for the five stock returns of Example 9.1. By looking for an elbow in the scree plot, indicating that the remaining eigenvalues are relatively small and all about the same size, one can determine the appropriate number of components. For both plots in Figure 9.5, two components appear to be appropriate. Finally, except for the case in which λj = 0 for j > i, selecting the first i principal components only provides an approximation to the total variance of the data. If a small i can provide a good approximation, then the simplification becomes valuable.

Remark

The R and S-Plus commands used to perform the PCA are given below. The command princomp gives the square root of the eigenvalue and denotes it as standard deviation.

Figure 9.5 Scree plots for two 5-dimensional asset returns: (a) series of Example 9.1 and (b) bond index returns of Example 9.3.

9.5

> rtn=read.table(“m-5clog-9008.txt”),header=T)

> pca.cov = princomp(rtn)

> names(pca.cov)

> summary(pca.cov)

> pca.cov$loadings

> screeplot(pca.cov)

> pca.corr=princomp(rtn,cor=T)

> summary(pac.corr)

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.162.26