Chapter 10 - Confidence-Weighted Mean Reversion (2/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

76 CONFIDENCE-WEIGHTED MEAN REVERSION

The Final Optimization Problem 2: CWMR-Stdev

(μ

t+1

, ϒ

t+1

) = arg min



log



detϒ



+Tr(ϒ

−2

) +(μ

−μ)



−2

(μ

−μ)



s.t.  −μ



≥ φϒx

, ϒ is PSD



1 = 1, μ  0. (10.4)

Clearly, the revised optimization problem (10.2) is equivalent to the raw optimiza-

tion problem (10.1). From the revised problem, we proposed two ﬁnal optimization

problems, Equations 10.3 and 10.4, which are convex and thus can be efﬁciently

solved by convex optimization (Boyd and Vandenberghe 2004). The ﬁrst variation,

CWMR-Var, linearizes the constraint; thus, it results in an approximate solution for the

revised and the raw optimizations. In contrast, the second variation, CWMR-Stdev, is

equivalent to the revised optimization problem (10.2) and results in an exact solution

for both the revised and raw optimization problems.

Remarks on Formulations: Note that the short version of this chapter (Li et al.

2011b) assumes log utility (Bernoulli 1954; Latané 1959) on μ



and is slightly

different from this version (Li et al. 2013). Since both  and φ are adjustable, they have

similar effects on μ. Assuming other parameters are constant except μ,asμ



log μ



, the current linear form can move μ toward the mean reversion proﬁtable

portfolio more than the log form can. However, the log form in this constraint causes

another convexity issue besides the standard deviation on the right-hand side. To solve

the optimization problem with log, Li et al. (2011b) chose to replace the log term by

its linear approximation, which may converge to a different solution. Moreover, the

current form and log’s linear approximation are essentially the same.

∗

Thus, we adopt

return without log, which has no above convexity issues concerning log and its linear

approximation.

10.3 Algorithms

Now, let us devise the proposed algorithms based on the optimization problem,

Equations 10.3 and 10.4. Their solutions are shown in Propositions 10.1 and 10.2,

respectively. Both proofs are presented in Appendices B.3.1 and B.3.2, respectively.

Proposition 10.1 The solution to the ﬁnal optimization problem (10.3) (CWMR-Var)

without considering the non-negativity constraint (μ  0) is expressed as

t+1

= μ

−λ

t+1



−¯x

1), 

−1

t+1

= 

−1

+2λ

t+1

φx



where λ

t+1

corresponds to the Lagrangian multiplier calculated as Equation B.10 in

Appendix B.3.1 and ¯x



denotes the CW average of x

∗

Current form is μ



. Li et al. (2011b) uses approximation,that is, log μ



≈ log μ



·(μ−μ

)



Both terms are linear, although their scales are different.

T&F Cat #K23731 — K23731_C010 — page 76 — 9/28/2015 — 21:24

ALGORITHMS 77

Proposition 10.2 The solution to the ﬁnal optimization problem (10.4) (CWMR-

Stdev) without considering the non-negativity constraint (μ  0) is expressed as

t+1

= μ

−λ

t+1



−¯x

1), 

−1

t+1

= 

−1

+λ

t+1



√

where λ

t+1

denotes the Lagrangian multiplier calculated as Equation B.14 in

Appendix B.3.2, ¯x



represents the CW average of x

, and V

= x



and

√

−λ

t+1

φV

t+1

+4V

denote the return variances for period t and t +1,

respectively.

Initially, with no information available for the task, we simply initialize μ

uniform and each diagonal element of the covariance matrix 

, or equiva-

lent standard deviation

. Note that we solve the optimization problems by ignoring

the non-negativity constraint (μ  0), which is a typical way to reduce the com-

plexity (Helmbold et al. 1998; Agarwal et al. 2006). To solve this issue that μ can

be negative, we simply project the resulting μ to the simplex domain (Agarwal et al.

2006). In the context of investment, this means that we ﬁrstly allow shorting, and later

lower the leverage

∗

by a simplex projection. Another remaining issue is that, although

the covariance matrix is nonsingular in theory, in real computation,  sometimes may

be singular due to computer precision. To avoid this problem and be consistent with

the projection of μ, we rescale  by normalizing its summation value to

, which

equals the sum of elements in μ

. Note that we arbitrarily choose

, while one

can choose other values, which generally do not affect the performance too much.

The ﬁnal CWMR algorithms are presented in Algorithm 10.1, and OLPS with both

deterministic and stochastic CWMR algorithms is illustrated in Algorithm 10.2.

The algorithms have two possible parameters, that is, conﬁdence parameter φ and

mean reversion parameter . Typically, the ﬁrst parameter, φ, can be 1.28, 1.64, 1.95,

or 2.57, with corresponding θ values of 80%, 90%, 95%, or 99%. As we have tested,

φ does not affect the ﬁnal performance too much. On the contrary,  has a signiﬁcant

impact on the ﬁnal performance. As our model is long-only,

†

we put more weights on

the poor-performing assets; thus,  is often in the range of [0, 1]. On the one hand, if the

value is too large, such as  ≥ 1.2, the last portfolio distribution can always satisfy

the constraint and requires no updates. In such a situation, with an initial uniform

portfolio, CWMR will degrade to uniform CRP. On the other hand, if the value is

too small, such as  ≤ 0.5, the last distribution can always dissatisfy the constraint

and has to be frequently updated. In between, CWMR updates the distribution when

the last distribution cannot satisfy the constraint. We will further validate the above

analysis by evaluating its parameter effect in Section 13.3.3.

∗

In investment, notional leverage denotes total holding assets plus total notional amount of liability

divided by equity. If shorting is allowed, the notional leverage equals



| to 1. The problem setting in

our study is long-only, in which we do not allow shoring/margin; thus, the leverage is always 1 to 1.

†

Long-only means no shoring/margin is allowed; thus, the notional leverage is always 1 to 1.

T&F Cat #K23731 — K23731_C010 — page 77 — 9/28/2015 — 21:24

78 CONFIDENCE-WEIGHTED MEAN REVERSION

Algorithm 10.1: Conﬁdence-Weighted Mean Reversion:

CWMR(φ, ,(μ

, 

), x

,t).

Input: φ: Conﬁdence parameter;  ∈[0, 1]: Mean reversion parameter;

(μ

, 

): Current portfolio distribution; x

: Historical market

sequence; t: Index of current trading period.

Output: (μ

t+1

, 

t+1

): Next portfolio distribution.

begin

Calculate the following variables:

= μ



= x



= x



1, ¯x



Update the portfolio distribution:

CWMR-Var

⎧

⎪

⎨

⎪

⎩

t+1

as in Equation B.10 in Appendix B.3.1

t+1

= μ

−λ

t+1



−¯x



t+1

= (

−1

+2λ

t+1

φdiag

))

−1

CWMR-Stdev

⎧

⎪

⎨

⎪

⎩

t+1

as in Equation B.14 in Appendix B.3.2

√

−λ

t+1

φV

t+1

+4V

t+1

= μ

−λ

t+1



−¯x



t+1

= (

−1

+λ

t+1

√

diag

))

−1

Normalize μ

t+1

and 

t+1

= arg min

μ∈

μ −μ

t+1



, 

t+1



t+1

mTr(

t+1

)

end

10.4 Analysis

In this section, we analyze and interpret the proposed algorithms. Firstly, we compare

CWMR with CW learning (Crammer et al. 2008; Dredze et al. 2008). Then, we

analyze CWMR’s update schemes, that is, μ and , with running examples. Further,

we describe the behavior of stochastic CWMR. Finally, we show its computational

time and compare it with existing work.

The proposed CWMR algorithms are partially motivated by CW learning, thus

their formulations and subsequent derivations are similar. However, they address

different problems, as CWMR handles OLPS while CW focuses on classiﬁcation.

Although both objectives adopt KL divergence to measure the closeness between two

distributions, their constraints reﬂect that they are oriented toward different problems.

To be speciﬁc, CW’s constraint is the probability of a correct classiﬁcation, while

T&F Cat #K23731 — K23731_C010 — page 78 — 9/28/2015 — 21:24

ANALYSIS 79

Algorithm 10.2: Online portfolio selection with CWMR.

Input: φ = 

−1

(θ): Conﬁdence parameter;  ∈[0, 1]: Mean reversion

parameter; x

: Historical market sequence.

Output: S

: Final cumulative wealth.

begin

Initialization: t = 1, μ

1, 

I, S

= 1;

for t = 1,...,ndo

Draw a portfolio b

from N(μ

, 

Deterministic CWMR : b

= μ

Stochastic CWMR :

∼ N(μ

, 

), b

= arg min

b∈

b −



Receive stock price relatives: x

= (x

,...,x

);

Calculate the daily return and cumulative return: S

= S

t−1

×(b



);

Update the portfolio distribution:

(μ

t+1

, 

t+1

) = CWMR(φ, ,(μ

, 

), x

,t);

end

CWMR’s constraints are the probability of an underperforming portfolio plus the

simplex constraint. If there is mean reversion, the portfolio should be a proﬁtable

one, in the next period. Their formulations’ differences result in subsequent different

derivations.

Then, we provide a preliminary analysis on μ, which is the main concern for

CWMR, to reﬂect its underlying mean reversion idea. Both CWMR-Var and CWMR-

Stdev share the same update on μ, that is,

t+1

= μ

−λ

t+1



−¯x

1).

Straightforwardly, we can rewrite its term as μ

t+1,i

= μ

t,i

−λ

t+1

t,i

−¯x

). Obvi-

ously, λ

t+1

is non-negative and 

is PSD. The term x

−¯x

1 denotes excess return

vector for period t, where ¯x

is the CW average of x

. Holding other terms constant,

t+1

tends to move toward μ

, while the magnitude is negatively related to the last

excess return, which is the mean reversion principle. Meanwhile, these movements

are dynamically adjusted by λ

t+1

, the last covariance matrix 

and mean μ

, which

catch both ﬁrst- and second-order information. To the best of our knowledge, none

of the existing algorithms has explicitly exploited the second-order information of b,

even though the second-order information could beneﬁt the proposed algorithms.

Let us continue to analyze . With only nonzero diagonal elements, we can write

the update of the i-th variance as

= σ

/(1 +λ

t+1



T&F Cat #K23731 — K23731_C010 — page 79 — 9/28/2015 — 21:24

80 CONFIDENCE-WEIGHTED MEAN REVERSION

where φ



= 2φ for CWMR-Var and φ



√

for CWMR-Stdev. Since both λ

t+1

and φ



are positive, poor-performing assets (with lower values of x

t,i

) have higher

variance terms than good-performing ones (with higher x

t,i

). Note that  denotes the

covariance matrix of brather than x.Thus,ahigher value means that the corresponding

mean is more volatile than others. Since we move the weights from good-performing

assets to poor-performing ones, the latter would change more than the former, that

is, the latter has higher volatility. In the next update of μ, assets with high volatility

would actively magnify the movement magnitude.

To better illustrate the updates, we give running updates based on a classic exam-

ple (Cover and Gluss 1986). Let a market consist of cash and one volatility asset, and

the sequence of x is







1, 2







,.... Obviously, market strategy can gain

nothing since no asset grows in the long run. The best CRP strategy, with b =





grows to





at the end of n periods. However, starting with μ





, the CWMR

strategy can grow to

×2

n−1

after n-th periods. Table 10.2 shows the running details

for the initial ﬁve periods, and further details can be derived. On each period t +1,

the mean moves toward the last mean and also moves far away by the excess return

vector (x

−¯x

1), and its magnitude is determined by both λ

and 

. Note that, in

this example, μ before projection is out of the simplex domain and is forced sparse

via normalization, which is not an usual case in real tests. In summary, both the ﬁrst-

and second-order information contribute to CWMR’s success.

Then, let us compare deterministic CWMR with the stochastic version (Line 4

in Algorithm 10.2), which draws a portfolio based on both the mean and covariance

matrix. Interestingly,  negatively affects CWMR’s performance in several aspects.

Firstly, a stochastic b drawn from the distribution is always different from the optimal

mean μ, which obviously causes performance divergences. Given that  converges

to the zero matrix (see the recursive updates in the two propositions), the distribution

of b conditioning on the data converges to the point mass at the mean parameter

value, μ = lim

. Thus, drawing weights b from the distribution (the stochastic

version) is suboptimal, since we already have an estimate of μ. It is better to choose b

as either the mode or mean (incidentally, the same for the Gaussian case), which is

actually deterministic CWMR. Another effect caused by the stochastic behavior is

Table 10.2 A running example of CWMR-Stdev on the Cover’s game

t x



−¯x

1 diag(

) μ

0 (0.25, 0.25) (0.5, 0.5)

1 (1.0, 0.5) (0.5, 0.5) 0.75 40.78 (0.25, −0.25) (0.10, 0.40) (0.0, 1.0)

2 (1.0, 2.0) (0.0, 1.0) 2.00 61.61 (−0.80, 0.20) (0.40, 0.10) (1.0, 0.0)

3 (1.0, 0.5) (1.0, 0.0) 1.00 75.56 (0.10, −0.40) (0.10, 0.40) (0.0, 1.0)

4 (1.0, 2.0) (0.0, 1.0) 2.00 61.61 (−0.80, 0.20) (0.40, 0.10) (1.0, 0.0)

5 (1.0, 0.5) (1.0, 0.0) 1.00 75.56 (0.10, −0.40) (0.10, 0.40) (0.0, 1.0)

T&F Cat #K23731 — K23731_C010 — page 80 — 9/28/2015 — 21:24

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 10 - Confidence-Weighted Mean Reversion (2/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 10 - Confidence-Weighted Mean Reversion (2/3)