 • Search in book...
• Toggle Font Controls 76 CONFIDENCE-WEIGHTED MEAN REVERSION
The Final Optimization Problem 2: CWMR-Stdev
(μ
t+1
, ϒ
t+1
) = arg min
1
2
log
detϒ
2
t
detϒ
2
+Tr(ϒ
2
t
ϒ
2
) +(μ
t
μ)
ϒ
2
t
(μ
t
μ)
s.t. μ
x
t
φϒx
t
, ϒ is PSD
μ
1 = 1, μ 0. (10.4)
Clearly, the revised optimization problem (10.2) is equivalent to the raw optimiza-
tion problem (10.1). From the revised problem, we proposed two ﬁnal optimization
problems, Equations 10.3 and 10.4, which are convex and thus can be efﬁciently
solved by convex optimization (Boyd and Vandenberghe 2004). The ﬁrst variation,
CWMR-Var, linearizes the constraint; thus, it results in an approximate solution for the
revised and the raw optimizations. In contrast, the second variation, CWMR-Stdev, is
equivalent to the revised optimization problem (10.2) and results in an exact solution
for both the revised and raw optimization problems.
Remarks on Formulations: Note that the short version of this chapter (Li et al.
2011b) assumes log utility (Bernoulli 1954; Latané 1959) on μ
x
t
and is slightly
different from this version (Li et al. 2013). Since both and φ are adjustable, they have
similar effects on μ. Assuming other parameters are constant except μ,asμ
x
t
>
log μ
x
t
, the current linear form can move μ toward the mean reversion proﬁtable
portfolio more than the log form can. However, the log form in this constraint causes
another convexity issue besides the standard deviation on the right-hand side. To solve
the optimization problem with log, Li et al. (2011b) chose to replace the log term by
its linear approximation, which may converge to a different solution. Moreover, the
current form and log’s linear approximation are essentially the same.
return without log, which has no above convexity issues concerning log and its linear
approximation.
10.3 Algorithms
Now, let us devise the proposed algorithms based on the optimization problem,
Equations 10.3 and 10.4. Their solutions are shown in Propositions 10.1 and 10.2,
respectively. Both proofs are presented in Appendices B.3.1 and B.3.2, respectively.
Proposition 10.1 The solution to the ﬁnal optimization problem (10.3) (CWMR-Var)
without considering the non-negativity constraint (μ 0) is expressed as
μ
t+1
= μ
t
λ
t+1
t
(x
t
−¯x
t
1),
1
t+1
=
1
t
+2λ
t+1
φx
t
x
t
,
where λ
t+1
corresponds to the Lagrangian multiplier calculated as Equation B.10 in
Appendix B.3.1 and ¯x
t
=
1
t
x
t
1
t
1
denotes the CW average of x
t
.
Current form is μ
x
t
. Li et al. (2011b) uses approximation,that is, log μ
x
t
log μ
t
x
t
+
x
t
·(μμ
t
)
μ
t
x
t
.
Both terms are linear, although their scales are different.
T&F Cat #K23731 — K23731_C010 — page 76 — 9/28/2015 — 21:24 ALGORITHMS 77
Proposition 10.2 The solution to the ﬁnal optimization problem (10.4) (CWMR-
Stdev) without considering the non-negativity constraint (μ 0) is expressed as
μ
t+1
= μ
t
λ
t+1
t
(x
t
−¯x
t
1),
1
t+1
=
1
t
+λ
t+1
φ
x
t
x
t
U
t
,
where λ
t+1
denotes the Lagrangian multiplier calculated as Equation B.14 in
Appendix B.3.2, ¯x
t
=
1
t
x
t
1
t
1
represents the CW average of x
t
, and V
t
= x
t
t
x
t
and
U
t
=
λ
t+1
φV
t
+
,
λ
2
t+1
φ
2
V
2
t
+4V
t
2
denote the return variances for period t and t +1,
respectively.
Initially, with no information available for the task, we simply initialize μ
1
to
uniform and each diagonal element of the covariance matrix
1
to
1
m
2
, or equiva-
lent standard deviation
1
m
. Note that we solve the optimization problems by ignoring
the non-negativity constraint (μ 0), which is a typical way to reduce the com-
plexity (Helmbold et al. 1998; Agarwal et al. 2006). To solve this issue that μ can
be negative, we simply project the resulting μ to the simplex domain (Agarwal et al.
2006). In the context of investment, this means that we ﬁrstly allow shorting, and later
lower the leverage
by a simplex projection. Another remaining issue is that, although
the covariance matrix is nonsingular in theory, in real computation, sometimes may
be singular due to computer precision. To avoid this problem and be consistent with
the projection of μ, we rescale by normalizing its summation value to
1
m
, which
equals the sum of elements in μ
1
. Note that we arbitrarily choose
1
m
, while one
can choose other values, which generally do not affect the performance too much.
The ﬁnal CWMR algorithms are presented in Algorithm 10.1, and OLPS with both
deterministic and stochastic CWMR algorithms is illustrated in Algorithm 10.2.
The algorithms have two possible parameters, that is, conﬁdence parameter φ and
mean reversion parameter . Typically, the ﬁrst parameter, φ, can be 1.28, 1.64, 1.95,
or 2.57, with corresponding θ values of 80%, 90%, 95%, or 99%. As we have tested,
φ does not affect the ﬁnal performance too much. On the contrary, has a signiﬁcant
impact on the ﬁnal performance. As our model is long-only,
we put more weights on
the poor-performing assets; thus, is often in the range of [0, 1]. On the one hand, if the
value is too large, such as 1.2, the last portfolio distribution can always satisfy
the constraint and requires no updates. In such a situation, with an initial uniform
portfolio, CWMR will degrade to uniform CRP. On the other hand, if the value is
too small, such as 0.5, the last distribution can always dissatisfy the constraint
and has to be frequently updated. In between, CWMR updates the distribution when
the last distribution cannot satisfy the constraint. We will further validate the above
analysis by evaluating its parameter effect in Section 13.3.3.
In investment, notional leverage denotes total holding assets plus total notional amount of liability
divided by equity. If shorting is allowed, the notional leverage equals
i
|b
i
| to 1. The problem setting in
our study is long-only, in which we do not allow shoring/margin; thus, the leverage is always 1 to 1.
Long-only means no shoring/margin is allowed; thus, the notional leverage is always 1 to 1.
T&F Cat #K23731 — K23731_C010 — page 77 — 9/28/2015 — 21:24 78 CONFIDENCE-WEIGHTED MEAN REVERSION
Algorithm 10.1: Conﬁdence-Weighted Mean Reversion:
CWMR(φ, ,(μ
t
,
t
), x
t
1
,t).
Input: φ: Conﬁdence parameter; ∈[0, 1]: Mean reversion parameter;
(μ
t
,
t
): Current portfolio distribution; x
t
1
: Historical market
sequence; t: Index of current trading period.
Output: (μ
t+1
,
t+1
): Next portfolio distribution.
begin
Calculate the following variables:
M
t
= μ
t
x
t
,V
t
= x
t
t
x
t
,W
t
= x
t
t
1, ¯x
t
=
1
t
x
t
1
t
1
Update the portfolio distribution:
CWMR-Var
λ
t+1
as in Equation B.10 in Appendix B.3.1
μ
t+1
= μ
t
λ
t+1
t
(x
t
−¯x
t
1)
t+1
= (
1
t
+2λ
t+1
φdiag
2
(x
t
))
1
CWMR-Stdev
λ
t+1
as in Equation B.14 in Appendix B.3.2
U
t
=
λ
t+1
φV
t
+
,
λ
2
t+1
φ
2
V
2
t
+4V
t
2
μ
t+1
= μ
t
λ
t+1
t
(x
t
−¯x
t
1)
t+1
= (
1
t
+λ
t+1
φ
U
t
diag
2
(x
t
))
1
Normalize μ
t+1
and
t+1
:
μ
t+1
= arg min
μ
m
μ μ
t+1
2
,
t+1
=
t+1
mTr(
t+1
)
end
10.4 Analysis
In this section, we analyze and interpret the proposed algorithms. Firstly, we compare
CWMR with CW learning (Crammer et al. 2008; Dredze et al. 2008). Then, we
analyze CWMR’s update schemes, that is, μ and , with running examples. Further,
we describe the behavior of stochastic CWMR. Finally, we show its computational
time and compare it with existing work.
The proposed CWMR algorithms are partially motivated by CW learning, thus
their formulations and subsequent derivations are similar. However, they address
different problems, as CWMR handles OLPS while CW focuses on classiﬁcation.
Although both objectives adopt KL divergence to measure the closeness between two
distributions, their constraints reﬂect that they are oriented toward different problems.
To be speciﬁc, CW’s constraint is the probability of a correct classiﬁcation, while
T&F Cat #K23731 — K23731_C010 — page 78 — 9/28/2015 — 21:24 ANALYSIS 79
Algorithm 10.2: Online portfolio selection with CWMR.
Input: φ =
1
(θ): Conﬁdence parameter; ∈[0, 1]: Mean reversion
parameter; x
n
1
: Historical market sequence.
Output: S
n
: Final cumulative wealth.
begin
Initialization: t = 1, μ
1
=
1
m
1,
1
=
1
m
2
I, S
0
= 1;
for t = 1,...,ndo
Draw a portfolio b
t
from N(μ
t
,
t
):
Deterministic CWMR : b
t
= μ
t
Stochastic CWMR :
˜
b
t
N(μ
t
,
t
), b
t
= arg min
b
m
b
˜
b
t
2
t
= (x
t1
,...,x
tm
);
Calculate the daily return and cumulative return: S
t
= S
t1
×(b
t
x
t
);
Update the portfolio distribution:
(μ
t+1
,
t+1
) = CWMR(φ, ,(μ
t
,
t
), x
t
1
,t);
end
end
CWMR’s constraints are the probability of an underperforming portfolio plus the
simplex constraint. If there is mean reversion, the portfolio should be a proﬁtable
one, in the next period. Their formulations’ differences result in subsequent different
derivations.
Then, we provide a preliminary analysis on μ, which is the main concern for
CWMR, to reﬂect its underlying mean reversion idea. Both CWMR-Var and CWMR-
Stdev share the same update on μ, that is,
μ
t+1
= μ
t
λ
t+1
t
(x
t
−¯x
t
1).
Straightforwardly, we can rewrite its term as μ
t+1,i
= μ
t,i
λ
t+1
σ
2
t
(x
t,i
−¯x
t
). Obvi-
ously, λ
t+1
is non-negative and
t
is PSD. The term x
t
−¯x
t
1 denotes excess return
vector for period t, where ¯x
t
is the CW average of x
t
. Holding other terms constant,
μ
t+1
tends to move toward μ
t
, while the magnitude is negatively related to the last
excess return, which is the mean reversion principle. Meanwhile, these movements
t+1
, the last covariance matrix
t
and mean μ
t
, which
catch both ﬁrst- and second-order information. To the best of our knowledge, none
of the existing algorithms has explicitly exploited the second-order information of b,
even though the second-order information could beneﬁt the proposed algorithms.
Let us continue to analyze . With only nonzero diagonal elements, we can write
the update of the i-th variance as
σ
2
= σ
2
i
/(1 +λ
t+1
φ
x
2
ti
σ
2
i
),
T&F Cat #K23731 — K23731_C010 — page 79 — 9/28/2015 — 21:24 80 CONFIDENCE-WEIGHTED MEAN REVERSION
where φ
= 2φ for CWMR-Var and φ
=
φ
U
t
for CWMR-Stdev. Since both λ
t+1
and φ
are positive, poor-performing assets (with lower values of x
t,i
) have higher
variance terms than good-performing ones (with higher x
t,i
). Note that denotes the
covariance matrix of brather than x.Thus,ahigher value means that the corresponding
mean is more volatile than others. Since we move the weights from good-performing
assets to poor-performing ones, the latter would change more than the former, that
is, the latter has higher volatility. In the next update of μ, assets with high volatility
would actively magnify the movement magnitude.
To better illustrate the updates, we give running updates based on a classic exam-
ple (Cover and Gluss 1986). Let a market consist of cash and one volatility asset, and
the sequence of x is
1,
1
2
,
1, 2
,
1,
1
2
,.... Obviously, market strategy can gain
nothing since no asset grows in the long run. The best CRP strategy, with b =
1
2
,
1
2
,
grows to
9
8
n
2
at the end of n periods. However, starting with μ
0
=
1
2
,
1
2
, the CWMR
strategy can grow to
3
4
×2
n1
2
after n-th periods. Table 10.2 shows the running details
for the initial ﬁve periods, and further details can be derived. On each period t +1,
the mean moves toward the last mean and also moves far away by the excess return
vector (x
t
−¯x
t
1), and its magnitude is determined by both λ
t
and
t
. Note that, in
this example, μ before projection is out of the simplex domain and is forced sparse
via normalization, which is not an usual case in real tests. In summary, both the ﬁrst-
and second-order information contribute to CWMR’s success.
Then, let us compare deterministic CWMR with the stochastic version (Line 4
in Algorithm 10.2), which draws a portfolio based on both the mean and covariance
matrix. Interestingly, negatively affects CWMR’s performance in several aspects.
Firstly, a stochastic b drawn from the distribution is always different from the optimal
mean μ, which obviously causes performance divergences. Given that converges
to the zero matrix (see the recursive updates in the two propositions), the distribution
of b conditioning on the data converges to the point mass at the mean parameter
value, μ = lim
t
μ
t
. Thus, drawing weights b from the distribution (the stochastic
version) is suboptimal, since we already have an estimate of μ. It is better to choose b
as either the mode or mean (incidentally, the same for the Gaussian case), which is
actually deterministic CWMR. Another effect caused by the stochastic behavior is
Table 10.2 A running example of CWMR-Stdev on the Covers game
t x
t
b
t
b
t
x
t
λ
t
x
t
−¯x
t
1 diag(
t
) μ
t
0 (0.25, 0.25) (0.5, 0.5)
1 (1.0, 0.5) (0.5, 0.5) 0.75 40.78 (0.25, 0.25) (0.10, 0.40) (0.0, 1.0)
2 (1.0, 2.0) (0.0, 1.0) 2.00 61.61 (0.80, 0.20) (0.40, 0.10) (1.0, 0.0)
3 (1.0, 0.5) (1.0, 0.0) 1.00 75.56 (0.10, 0.40) (0.10, 0.40) (0.0, 1.0)
4 (1.0, 2.0) (0.0, 1.0) 2.00 61.61 (0.80, 0.20) (0.40, 0.10) (1.0, 0.0)
5 (1.0, 0.5) (1.0, 0.0) 1.00 75.56 (0.10, 0.40) (0.10, 0.40) (0.0, 1.0)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
T&F Cat #K23731 — K23731_C010 — page 80 — 9/28/2015 — 21:24
• No Comment
..................Content has been hidden....................