Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 11
Support Vector Machine‐Based Global Tactical Asset Allocation

Joel Guglietta

11.1 INTRODUCTION

In this chapter we show how machine learning, more specifically support vector machine/regression (SVM/R, can help building global tactical asset allocation (GTAA) portfolio. First, we will present a quick literature review on GTAA, explaining the different families of asset allocation. We will then go through a historical perspective of tactical asset allocation in the last 50 years, introducing the seminal concepts behind it. Section 11.3 will explain the definition of support vector machine (SVM) and support vector relevance (SVR). Section 11.4 will present the machine learning model used for tactical asset allocation and will discuss the results.

11.2 FIFTY YEARS OF GLOBAL TACTICAL ASSET ALLOCATION

Running the risk of stating the obvious, the objective of asset allocation is to obtain the best expected return‐to‐risk portfolio (Dahlquist and Harvey, 2001). The authors distinguish three families of asset allocation: (i) benchmark asset allocation, (ii) strategic asset allocation, (iii) GTAA (see Figure 11.1). The investment portfolio strategy built in this chapter belongs to the third class of model where predictions models use today's information set in order to forecast asset returns.

Flow diagram depicting three families of asset allocation with Portfolio weights, Mode, and Information and relationships indicated by arrows. — **Figure 11.1** Three families of asset allocation.

*Source:* Dahlquist and Harvey (2001).

Practitioners have been managing GTAA strategies for almost 50 years. GTAA broadly refers to active managed portfolios that seek to enhance portfolio performance by ‘opportunistically shifting the asset mix in a portfolio in response to the changing patterns of return and risk’ (Martellini and Sfeir, 2003). Ray Dalio, CEO of Bridgewater, made this approach popular in the 1990s with his ‘All‐Weather’ portfolio.

The theory backing such an investment approach is well documented. W. Sharpe showed in 1963 that assets' returns can be decomposed into a systematic and a specific component. Armed with this time‐honoured framework, portfolio managers deploy two forms of active strategies: (i) market timing, which aims at exploiting predictability in systematic return, and (ii) stock picking, which aims at exploiting predictability in specific return. The academic literature suggests that there is ample evidence of predictability in the systematic component (Keim and Stambaugh 1986; Campbell 1987; Campbell and Shiller 1988; Fama and French 1989; Ferson and Harvey 1991; Bekaert and Hodrick 1992; Harasty and Roulet 2000), which is less true for the specific component.

After Samuelson (1969) and Merton (1969, 1971, 1973), who showed that optimal portfolio strategies are significantly affected by the presence of a stochastic opportunity set, optimal portfolio decision rules have been enriched to factor in the presence of predictable returns (Barberis 2000; Campbell and Viceira 1998; Campbell et al. 2000; Brennan et al. 1997; Lynch and Balduzzi 1999, 2000; Ait‐Sahalia and Brandt 2001). In a nutshell, all these models suggest that investors should increase their allocation to risky assets in periods of high expected returns (market timing) and decrease their allocation in periods of high volatility (volatility timing). Interestingly enough, Kandel and Stambaugh (1996) argue that even a low level of statistical predictability can generate economic significance and abnormal returns may be attained even if the market is successfully timed only 1 out of 100 times.

In essence, GTAA is a two‐step process where first practitioners forecast asset returns by asset classes, then they build portfolios based on this forecast. Close to GTAA but without the forecasting part, risk parity portfolios (Hurst et al. 2010) are now a behemoth in the making, with almost US$3 trillion managed according to this method. Risk parity is often said to be the ‘cheap’ version of Bridgewater's ‘All‐Weather’ portfolio. We agree. GTAA and risk parity bear some similarities as they both try to exploit the one and only free lunch out there: diversification. However, risk parity is nothing but a mere ‘technicality’ for portfolio construction (the so‐called ‘one‐to‐sigma’ approach where the weight of a given instrument is inverse in its realized – sometimes expected – volatility). GTAA tries to condition the asset mix based on the current information set in order to build a portfolio which is ‘better’ (i.e. hopefully delivering a higher return‐to‐risk profile) fitted to the current (or expected) economic cycle. For instance, Chong and Phillips (2014) build a GTAA based on 18 economic factors using their ‘Eta pricing model’. Whereas one is mean–variance optimized (ECR‐MVO), the other is constructed to reduce its economic exposure (MIN). Both are long‐only portfolios and are rebalanced semi‐annually.

To conclude, the holy grail of GTAA remains to build a portfolio which performs equally well in any kind of economic environment. In order to achieve this, the portfolio manager has to find the optimal asset mix. The asset mix is usually made of fixed income (long‐term and intermediary), equity and commodity (others sometimes add real estate). The economic cycle impacting this asset mix can be modelled with different granularity. We follow R. Dalio in trying not to over‐complexify things and chose a sparse model of the economic cycle, only using soft data (survey) for real business cycle (RBC) indicators and realized inflation.

11.3 SUPPORT VECTOR MACHINE IN THE ECONOMIC LITERATURE

A thorough introduction to SVM and support vector regression is beyond the scope of this chapter. However, we feel it is appropriate to explain briefly how SVM and support vector regression operate for financial practitioners (and to take time to define some useful mathematical notions) who are not familiar with it and why we chose this method over alternative machine learning algorithms. We address basic technicalities of SVM here.

As Y. Abu‐Mostafa (Caltech) puts it, SVM is arguably the most successful classification method in machine learning with a neat solution which has a very intuitive interpretation. Motivated by Statistical Learning Theory, SVM is a ‘learning machine’ introduced by Boser, Guyon and Vapnik in 1992 that falls into the category of supervised estimation algorithms (a learning algorithm that analyzes the training data and produces an inferred function, which can be used for mapping new data points). It is made up of three steps:

Parameter estimation, i.e. training from a data set.
Computation of the function value, i.e. testing.
Generalization accuracy, i.e. performance.

As M. Sewell (2008, 2010) notes, ‘the development of Artificial Neural Networks (ANNs) followed a heuristic path, with applications and extensive experimentation preceding theory. In contrast, the development of SVMs involved sound theory first, then implementation and experiments.’

As far as parameter estimation is concerned, ‘a significant advantage of SVMs is that whilst ANNs can suffer from multiple local minima, the solution to an SVM is global and unique’. This is due to the fact that training involves optimization of a convex cost function, which explains why there is no local minimum to complicate the learning process. Testing is based on the model evaluation using the most informative patterns in the data, i.e. support vectors (the points upon which the separating hyperplanes lie). Performance is based on error rate determination as test set size grows to infinity.

SVMs have more advantages over ANNs. First, they have a simple geometric interpretation and give a sparse solution. Unlike ANNs, the computational complexity of SVMs does not depend on the dimensionality of the input space. Second, while ANNs use empirical risk minimization (that does not work very well in practice as the bounds are way too loose), SVMs use structural risk minimization (SRM). In their seminal 1974 paper, V. Vapnik and A. Chervonenkis set out the SRM principle which uses the VC (for Vapnik–Chervonenkis) dimension. The VC dimension is a measure of the capacity (complexity) of a space of functions that can be learned by a statistical classification algorithm. The SRM is an inductive principle for model selection used for learning from finite training data sets. It describes a general model of capacity control and provides a tradeoff between hypothesis space complexity and the quality of fitting the training data (empirical error). Sewell (ibid) defines the procedure as below.

Using a priori knowledge of the domain, choose a class of functions, such as polynomials of degree n, neural networks having n hidden layer neurons, a set of splines with n nodes or fuzzy logic (a form of many‐valued logic in which the truth values of variables may be any real number between 0 and 1) models having n rules.
Divide the class of functions into a hierarchy of nested subsets in order of increasing complexity. For example, polynomials of increasing degree.
Perform empirical risk minimization on each subset (this is essentially parameter selection).
Select the model in the series whose sum of empirical risk and VC confidence is minimal.

SVMs often outperform ANNs in practice because they deal with the biggest problem that ANNs face, i.e. overfitting. As they are less prone to such a cardinal disease, they ‘generalize’ in a much better way. We should note, however, that while the use of kernel function enables the curse of dimensionality to be addressed, proper kernel function for certain problems is dependent on the specific dataset and as such there is no good method for the choice of kernel function (Chaudhuri 2014). From a practical point of view, the biggest limitation of the support vector approach lies in choice of the kernel (Burges 1998; Horváth 2003).

SVM can be applied to both classification and regression. When an SVM is applied to a regression problem, it is called support vector regression. What is the difference between SVM and SVR? SVR is based on the computation of a linear regression function in a high‐dimensional feature space where the input data is mapped via a non‐linear function. In order to give an intuition of how SVR works, let's assume we are given a linearly separable set of points of two different classes y_i ∈ {−1, + 1}. The objective of an SVM is to find a particular hyperplane separating these two classes y_i with minimum error while also making sure that the perpendicular distance between the two closes points from either of these two classes is maximized. In order to determine this hyperplane, we set constraints like this:

It is straightforward to transform this classification problem into a regression problem. Let's write: y_i − w. x_i − b ≤ ɛ and −(y_i − w. x_i − b) ≤ ɛ_.

The two equations above state that the hyperplane has points on either side of it such that the distance between these points and the hyperplane should not be further than ε. In a two‐dimension plane, this comes down to trying to draw a line somewhere in the middle of the set of points such that this line is as close to them as possible. This is precisely what SVR is doing. Instead of minimizing the observed training error, SVR attempts to minimize the generalization error bound so as to achieve generalized performance.

SVM provides a novel approach to the two‐category classification problem such as crisis or a non‐crisis (Burges 1998). The method has been successfully applied to a number of applications, ranging from particle identification, face identification and text categorization to engine detection, bioinformatics and database marketing. For instance, A. Chaudhuri uses an SVM for currency crisis detection. Lai and Liu (2010) compare the performance in financial market prediction of an ANNs approach and the regression feature of SVM. The historical values used are those of the Hang Sang Index (HSI) from 2002 to 2007 and data for January 2007 and January 2008. SVM performs well in the short‐term forecast. Other authors such as Shafiee et al. (2013) get an accuracy rate as high as 92.16% in forecasting Iranian stock returns. Using daily closing prices for 34 technology stocks to calculate price volatility and momentum for individual stocks and for the overall sector, Mage (2015) use an SVM to predict whether a stock price some time in the future will be higher or lower than it is on a given day. Though the author finds little predictive ability in the short run, he finds definite predictive ability in the long run.

Bajari et al. (2015) note that applied econometricians voice scepticism about machine learning models because they do not have a clear interpretation and it is not obvious how to apply them to estimate causal effects. Some recent works suggest, however, that such machine learning methods yield interesting results. For instance, McNelis and McAdam (2004) apply linear and neural network‐based ‘thick’ models for forecasting inflation based on Phillips–curve formulations in the US, Japan and the euro area. Thick models represent ‘trimmed mean’ forecasts from several neural network models. They outperform the best‐performing linear models for ‘real‐time’ and ‘bootstrap’ forecasts for service indices for the euro are, and do well, sometimes better, for the more general consumer and producer price indices across a variety of countries. Back to the SVM, Bajari and Ali, tackling the problem of demand estimation, focus on three classes of model: (i) linear regression as the baseline model, (ii) logit as the econometric model, (iii) stepwise, forward stage‐wise, LASSO, random forest, SVM and bagging as the machine learning models. Interestingly enough, they show that machine learning models consistently give better out‐of‐sample prediction accuracy while holding in‐sample prediction error comparable in order to estimate. SVR has been applied in time series and financial prediction. For example, Zhang and Li (2013) use SVR to model to forecast CPI. Money gap and CPI historical data are utilized to perform forecasts. Furthermore, the grid search method is applied to select the parameters of SVR. In addition, this study examines the feasibility of applying SVR in inflation forecasting by comparing it with back‐propagation neural network and linear regression. The result shows that SVR provides a promising alternative to inflation prediction.

11.3.1 Understanding SVM

SVM is essentially an algorithm used to solve a classification problem such as deciding which stocks to buy and to sell. The main notion boils down to maximizing the ‘margin’ between these two groups of stocks. The so‐called ‘kernel trick’ (defined below) is used to address non‐linearities. The main mathematics involved is some college geometry and quadratic optimization (derivative calculus).

We first assume a linearly separable data set, for example a set of four stocks. In order to better visualize our problem, let's assume we measure two attributes x_i on these four stocks (such as earning quality and price momentum for example) in the input space X. These two attributes form a two‐dimension space. At a given time, one can plot the four stocks in this plane (scatter plots in Figure 11.2). Let's assume two classes y_i ∈ {−1, + 1}, with long stocks (+1, the green dots) and short stocks (−1, the red dots). The problem we try to solve is whether there is any advantage to choosing a separating line over other lines. One first should note that such a line is a hyperplane (of dimension 1, a line then) with equation w′. x = 0, with w the vector of weights (w′ being the transpose).

Image described by caption and surrounding text. — **Figure 11.2** The kernel trick

Let's examine the three examples above and ask ourselves which is the best line to separate the points. In case (a) the margin is lower than the one in case (b), which is lower than the one in case (c). In the three cases, the in‐sample error is zero. As far as generalization is concerned, as we deal with four points in a linear separable state, generalization as an estimate will be the same. Intuitively however, one should feel that a fat margin (case c) is better. This brings two questions: (i) Why is a fatter margin better? (ii) How can we solve for the weight w that maximizes this margin?

In all likelihood, the process that generates the data is noisy. Therefore, when the margin is thin, the chance of having a point which is misclassified is higher than in the case of a fatter margin. This gives an intuition as to why a fatter margin is better. The proof is based on the so‐called Vapnik‐Chervonenkis analysis where one can show that a fatter margin ushers in a lower Vapnik‐Chervonenkis dimension (the VC dimension being the cardinality of the largest set of points that the algorithm can shatter). Practically, a fatter margin implies better out‐of‐sample performance.

Now, let's find the weight w that maximizes the margin. The margin is simply the distance D from a plane to a point, which brings us back to our college geometry. Let's define x_n as the nearest data point to the separating line (hyperplane) w′. x = 0. How far is this point from the hyperplane? Before doing this, let's address two technicalities.

First, we normalize w. Let's note that for every point, we have |w′. x_n| > 0 for every point. The objective is to relate w to the margin. Note that we can scale w up and down as the hyperplane equation (w′. x = 0) is scale‐invariant. Without loss of generality, we consider all the representations of the same hyperplane and just pick the one for which we have |w′. x_n| = 1 for the minimal point. This will simplify the analysis latter on.

Second, we introduce an artificial coordinate x₀. Think of it as a constant to which we assign a weight w₀. In order to avoid confusion, we rename this weight w₀ as the bias b. We have now a new (‘new’ compared with the vector w used in w′. x = 0) weight vector w = (w₁, …, w_p), with p the number of attributes (such as earning quality, price momentum, Merton's distance to default, liquidity). We will see that this new vector w and b have different roles when we solve for the maximum margin and it is no longer convenient to have both blended in the same vector. The equation for the hyperplane is now: w′. x + b = 0 and w = (w₁, …, w_p).

We can now compute the distance D between x_n and the hyperplane of equation w′. x + b = 0 where ∣ w′. x_n + b ∣ = 1.

First, the vector w is perpendicular to the plane in the input space X. This is straightforward to show. Let's consider any two points x₁ and x₂ on the plane. We have w′. x₁ + b = 0 and w′. x₂ + b = 0. The difference between these two points is w^′. (x₁ − x₂) = 0, which shows that w^′ is orthogonal to every vector (x₁ − x₂) in the plane and therefore is orthogonal to the plane.

Second, we take any point x on the plane. The projection of the vector going from point x to point x_n (i.e. vector x_n − x) on the vector w orthogonal to the plane is the distance D to the plane. In order to do so we compute first the unit vector , i.e. the vector normalized by its norm ‖w‖, such that . The distance is the inner (dot) product such that Hence, D =1/‖w‖ * |w′. (x_n − x)| = 1/‖w‖ * |w′. x_n + b − w′. x − b| = 1/‖w‖ as |w′. x_n + b| = 1 and |w′. x − b| = 0. One sees that the distance between the nearest point to the hyperplane and this hyperplane is nothing but one over the norm ‖w‖ of w.

We can now formulate our optimization problem. Our objective is to

subject to min_{n = 1,2,…,N}|w^′. x_n + b| = 1 (meaning minimization over all the points 1, 2, …, N of the data set).

This not a friendly optimization problem as the constraint has a minimum (and an absolute value in it, but this one is easy to solve). As a consequence, we try to find an equivalent problem which is easier to solve, i.e. getting rid of the minimization in the constraint mainly.

First, we consider only the points that separate the data set correctly, i.e. the points for which the label y_n (long or short) agrees with the signal (w′. x_n + b), so that we have |w′. x_n + b| = y_n(w′. x_n + b), which allows us to get rid of the absolute value.

Second instead of maximizing 1/‖w‖, we minimize the following quadratic quantity (objective function) 1/2* w′. w

subject to y_n(w′. x_n + b) ≥ 1 for all points n = 1, 2, …, N.

Formally speaking, we face a constrained optimization problem where the objective function is to minimize 1/2^* w′. w. This is usually solved in writing a Lagrangian expression. The minor problem here is that we have an inequality in the constraint. Solving such a Lagrangian under inequality constraint is known as the Karush–Kuhn–Tucker approach (KKT).

The first step is to rewrite the inequality constraint y_n(w′. x_n + b) ≥ 1 in a zero form, i.e. to write it as a ‘slack’ y_n(w′. x_n + b) − 1 ≥ 0 and then multiply it by the Lagrange multiplier α_n so that we get the expression α_n(y_n(w′. x_n + b) − 1) that we add to the objective function.

The Lagrange formulation of our optimization problem becomes:

w.r.t. w and b with α_n ≥ 0 (we put a restriction on the domain) being the Lagrange multipliers, each point in the data set having such a Lagrange multiplier.

Writing the gradient ∇_wL of L(w, b, α) with respect to the vector w, we get the following condition:

(we want the gradient to be the vector 0, this is the condition we put to get the minimum).

Writing the partial derivative of L(w,b,α) with respect the scalar b we get another condition:

At this juncture, in order to make the problem easier to solve, we substitute these two conditions in the original Lagrangian and transform this minimization problem into a maximization problem so that maximization over α (which is tricky as α has a range) becomes free from w and b. This refers to the dual formulation of the problem.

From the above condition we get:

If we substitute these expressions in the Lagrangian L(w,b,α), after some manipulation, we get the following minor constrained optimization problem:

one sees that w and b drop from the optimization problem.

We maximize the above expression L( α) w.r.t. α subject to (the annoying constraint) α_n ≥ 0 for n = 1,2,…,N and .

Solving the above problem requires quadratic programming (quadratic programming package usually uses minimization). We therefore minimize:

The quadratic programming package gives us a vector of α = α₁, α₂, …, α_n from which we infer the w.

The condition which is key to the final support vector is the KKT condition which is satisfied at the minimum. The zero form of this condition is α_n(y_n(w^'. x_n + b) − 1) = 0 for n = 1, 2, …, N. Which means either the Lagrange multiplier α_n is 0 or the slack (y_n(w^′. x_n + b) − 1) is zero. For all the interior points, the slack is strictly positive, which means the Lagrange multiplier α_n is 0.

The most important points in the dataset are those which define the hyperplane and the margin. These points x_n are called support vectors – they support the hyperplane and are the ones for which α_n > 0. All the other points are interior points.

Once we have found w we pick any support vector and easily infer b from the equation y_n(w^′. x_n + b) = 1.

So far, we have talked about the linearly separable case. But what about the non‐separable case? We can handle this case in transforming the x into new variables z through a non‐linear function. The optimization problem becomes . This is the ‘kernel’ trick which make SVM so powerful in dealing with non‐linearities. Rather than using a scalar product in the high‐dimensional feature space X, we use a kernel function Z such as z = Z(x) in R^k, which plays the role of a scalar product in X. As an example, let's assume we are given a necklace with 30 pearls – 10 black pearls in the middle and then 10 red pearls on both sides. We are asked to draw one line and one line only to separate the black pearls from the red pearls. Let's assume the pearls are first on a one‐dimension space (line, case a). Separating the pearls with one line only is not possible. However, in a two‐dimension space with the help of a simple kernel (Z(x) = z = x^a), this becomes easy, as Figure 11.3 shows in case (b).

11.4 A SVR‐BASED GTAA

Our GTAA is deployed using exchange traded funds (ETFs) covering all the asset classes usually found in such portfolios (14 instruments) (Table 11.1).

Table 11.1 Universe traded

Source: J. Guglietta.

Sector	Bloomberg ticker	Instrument name	Expense ratio
	SPY US Equity	SPDR S&P500 ETF Trust	0.09%
	QQQ US Equity	Powershares QQQ Trust Series 1	0.20%
Equity	VGK US Equity	Vanguard FTSE Europe ETF	0.10%
	EWJ US Equity	iShares MSCI Japan ETF	0.48%
	VWO US Equity	Vanguard FTSE Emerging Markets	0.14%
REIT	VNQ US Equity	VANGUARD REIT ETF	0.12%
	AGG US Equity	iShares Core U.S. Aggregate Bo	0.05%
	LQD US Equity	iShares iBoxx $ Investment Gra	0.15%
Fixed income	TIP US Equity	iShares TIPS Bond ETF	0.20%
	MUB US Equity	iShares National Muni Bond ETF	0.25%
	HYG US Equity	iShares iBoxx $ High Yield Cor	0.50%
	EMB US Equity	iShares JP Morgan USD Emerging	0.40%
	GLD US Equity	SPDR Gold Shares	0.40%
	DBC US Equity	PowerShares DB Commodity Index	0.89%

11.4.1 Data

We use ETFs as they have a number of features that make them ideal investments for this purpose. The most attractive feature is diversity as the range of available ETFs includes almost every asset class. The wide range of ETFs allows us to construct a diversified portfolio using fewer investments and therefore less capital. ETFs are now a US$3 trillion global market, are more liquid than mutual funds and can be traded throughout the day. Finally, ETFs are cheaper to run than mutual funds. This lower cost tends to get passed on to investors.

11.4.2 Model description

As explained above, we build a predictions model using today's information in order to forecast asset returns. Each and every week t, for each and any instrument i, we forecast return one week ahead (k = 5 as our data base is daily) using an SVR with a linear kernel (using gaussian, radial basis function or polynomial kernels does not help) and three different categories of factors as ‘predictor’ variables. Formally speaking we have:

and T, the rolling period over which the SVR is calibrated.

The first block is made of macro factors. Following R. Dalio/Bridgewater, we avoid over‐complexification and chose a limited number of economics time series in order to model the economic cycle. While Bridgewater uses (quarterly) gross domestic product (GDP) to model the RBC, we use four monthly soft data (survey). We use the same macro‐economic factors for all assets. These macro factors do not change from one week to the other. However, our experience suggests that it is wrong to believe financial markets factor in macro‐economic information quickly and updating weekly forecast based on monthly data adds value. As this model is still under production, we do not disclose which time series we use. The fifth time series captures inflation.

The second category of factors is a measure of systemic risk. We use our preferred greed and fear index, which is based on the variance risk premium (i.e. distance between the implied and realized volatility) of US equity.

The third and last group of factors are endogenous and have different price momentum, with lookback periods varying from one week to one year.

We chose to rebalance our GTAA on a weekly basis. Note that monthly or quarterly rebalancing yields good results too.

At the end of each and every week, the SVR hands us 14 (number of instruments) forecast returns . We constrain the portfolio to be long only. As it can happen that , we use a transformation (function) φ to constrain the forecast returns to be strictly positive such that and for all i. Finally, we scale these forecast returns by the realized volatility of the instruments' daily returns in order to have a signal‐to‐noise ratio such as:

The last step boils down to plugging in these signal‐to‐noise ratios to a portfolio optimization algorithm in order to build the portfolio. Portfolio construction, i.e. finding the optimal weights, is a rich field of research and a detailed discussion is beyond the scope of this chapter.

Many portfolio constructions are possible. Risk‐parity (the so‐called one‐to‐sigma) portfolio is the simplest one. Other choices exist, from mean–variance optimization to equal risk contribution or maximum diversified portfolio (that gives interesting results). We believe that portfolio managers are much more worried about left‐hand‐side risk than they are about volatility per se. This is the reason why our favourite portfolio construction method is a conditional value‐at‐risk (CVaR) portfolio, a method we use in many of the models currently in production.

CVaR is defined as the expected loss exceeding value‐at‐risk (VaR). Minimizing CVaR rather than VaR is preferred as VaR is not a coherent measure of risk. However, portfolios with low CVaR necessarily have low VaR as well. We are aware of the limitation of the CVaR portfolio which may give some instable solutions. However, we would want to note that this criticism extends to all portfolio construction methods.

Our GTAA is therefore a two‐step process where each and every week we forecast next week returns based on an SVR fed with macro factors, our greed and fear index and instrument price momenta. These expected returns are transformed into signals which are subsequently plugged into a conditional CVaR portfolio. The portfolio weights are given at the end of the week based on close price and executed at the open the next trading day (returns are computed net of expense fees and transaction costs).

11.4.3 Model results

Figure 11.4 shows the relative performance of our process compared with the often‐used (Hurst et al. 2010) benchmark strategy invested 60% in bond and 40% in equity. The total compounded geometric return over the period (March 2001 to March 2017) is 189% compared with 102% for the benchmark strategy. Our strategy outperforms the benchmarked strategy by 87% and exhibits smaller drawdown, especially during the global financial crisis. The one‐ and two‐year rolling information ratios (units of returns per unit of risk) are (of course) not constant but have been hovering 2 in the recent past. The total‐period information ratio is 0.77, i.e. 52.6% higher than that of the benchmark strategy (0.50). The annualized realized volatility of the strategy is 8.9%, i.e. 0.44% lower than the one of the chosen benchmark. The stability of the strategy, measured as the R² of a linear fit to the cumulative log returns, has a value of 91.7%, i.e. 40% higher than the one of the benchmark (65.4%) (Figures 11.4 and 11.5).

Line graph depicting SVR GTAA compared to 60% bond, 40% equity as curves for 14/3/2001 to 14/3/2017 with bond, equity dipping to -40% during 2008-09. — **Figure 11.4** SVR GTAA compared to 60% bond, 40% equity (non‐compounded arithmetic returns).

*Source:* Bloomberg, J. Guglietta.

Line graph depicting 1-year rolling information ratio compared to 2-year rolling information ratio as curves for 14/3/2001 to 14/3/2017 with 1-year dipping to -2 during 2008-09. — **Figure 11.5** SVR GTAA compared to 60% bond, 40% equity (non‐compounded arithmetic returns).

11.5 CONCLUSION

We have presented a GTAA portfolio resting on a transparent ‘quantamental’ framework. We believe that diversification remains the only (almost) free lunch, and therefore being able to build robust diversified portfolios should be sought after by all investors. Because of its machine learning characteristics, our SVR‐based GTAA portfolio can adapt (modify the asset mix) to different economic environments and provide such investors with a robust solution improving what we described as the main goal of asset allocation: getting the best expected return‐to‐risk profile.

REFERENCES

Ait‐Sahalia, Y. and Brandt, M.W. (2001). Variable selection for portfolio choice. Journal of Finance 56: 1297–1351.
Barberis, N. (2000). Investing for the long run when returns are predictable. Journal of Finance 55: 225–264.
Bekaert, G. and Hodrick, R.J. (1992). Characterizing predictable components in excess returns on equity and foreign exchange markets. Journal of Finance 47 (2): 467–509.
Burges, C.J.C. (1998). A tutorial on support vector machines for patterns recognition. Data Mining and Knowledge Discovery 2: 121–167.
Campbell, J.Y. (1987). Stock returns and the term structure. Journal of Financial Economics 18: 373–399.
Campbell, J.Y. and Shiller, R.J. (1998). Valuation ratios and the long‐run stock market outlook. Journal of Portfolio Management 24 (2): 11–26.
Campbell, J.Y. and Viceira, L. (1998). Who should buy long‐term bonds? NBER. Working Paper 6801.
Campbell, J., Chan, Y., and Viceira, L. (2000). A multivariate model of strategic asset allocation. Working Paper. Harvard University.
Chaudhuri, A. (2014). Support vector machine model for currency crisis discrimination. Birla Institute of Technology.
Chong, J. and Phillips, M. (2014). Tactical asset allocation with macroeconomic factors. The Journal of Wealth Management 17 (1): 58–69.
Dahlquist, M. and Harvey, C.R. (2001). Global tactical asset allocation. Duke University.
Fama, E.F. and French, K.R. (1989). Business conditions and expected returns on stocks and bonds. Journal of Financial Economics 25: 2349.
Ferson, W.E. and Harvey, C.R. (1991). The variation of economic risk premiums. Journal of Political Economy 99 (2): 385–341.
Harasty, H. and Roulet, J. (2000). Modelling stick market returns. The Journal of Portfolio Management 26 (2): 33–46.
Horváth, G. (2003). Advances in learning theory: methods, models and applications. In: NATO‐ASI Series III: Computer and Systems Sciences, vol. 190 (ed. J.A.K. Suykens, G. Horvath, S. Basu, et al.). Amsterdam: IOS Press.
Hurst, B., Johnson, B.W., and Ooi, Y.H. (2010). Understanding Risk Parity. AQR.
Kandel, S. and Stambaugh, R.F. (1996). On the predictability of stock returns: an asset‐allocation perspective. The Journal of Finance 51 (2): 385–424.
Keim, D.B. and Stambauh, R.F. (1986). Predicting returns in the stock and bond market. Journal of Financial Economics 17 (2): 357–390.
Lai, L.K.C. and Liu, J.N.K. (2010). Stock forecasting using Support Vector Machine. International Conference on Machine Learning and Cybernetics (ICMLC).
Lynch, A.W. and Balduzzi, P. (1999). Transaction cost and predictability: some utility cost calculations. Journal of Financial Economics 52 (1): 47–78.
Lynch, A.W. and Balduzzi, P. (2000). Predictability and transaction costs: the impact on rebalancing rules and behaviour. Journal of Finance 55: 2285–2310.
Madge, S. (2015). Predicting stock price direction using support vector machines. Independent Work Report. Spring 2015.
Martellini, L. and Sfeir, D. (2003). Tactical Asset Allocation. EDHEC.
McNelis, P. and McAdam, P. (2004). Forecasting inflation with thick models and neutral networks. European Central Bank, Working Paper Series n352, April.
Merton, R.C. (1969). Lifetime portfolio selection under uncertainty: the continuous‐time case. The Review of Economics and Statistics 51: 247–257.
Merton, R.C. (1971). Optimal consumption and portfolio rules in a continuous‐time model. Journal of Economic Theory 3: 373–413.
Merton, R.C. (1973). An intertemporal capital asset pricing model. Econometrica 41: 867–888.
Samuelson, P. (1969). Lifetime portfolio selection by dynamic stochastic. The Review of Economics and Statistics 51 (3): 239–246.
Sewell, M. (2008). Structural risk minimization. Department of Computer Science: University College London.
Sewell, M. (2010). The application of intelligent systems to financial time series analysis. PhD thesis, PhD dissertation, Department of Computer Science, University College London.
Zhang, L. and Li, J. (2013). Inflation forecasting using support vector regression. 2012 Fourth International Symposium on Information Science and Engineering.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 11: Support Vector Machine‐Based Global Tactical Asset Allocation

Create new playlist

Sign In

Sign Up