Chapter 4
Follow the Winner
The first principle, follow the winner, is characterized by increasing the weights of
more successful experts or stocks. Rather than targeting market and best stock, algo-
rithms in this category often aim to track the BCRP strategy, that is, their target is to
be universal.
This chapter is organized as follows. Section 4.1 introduces Cover’s universal
portfolios (UP) algorithm, and Section 4.2 details the exponential gradient (EG) algo-
rithm. Sections 4.3 and 4.4 introduce the follow the leader (FTL) and follow the
regularized leader (FTRL) approaches, respectively. Finally, Section 4.5 summarizes
the follow the winner principle.
4.1 Universal Portfolios
The basic idea of Universal Portfolios–type (UP-type) algorithms is to assign the
capital to base experts of a single class, let the experts run, and finally pool their wealth.
They are analogous to the Buy-and-Hold (BAH) strategy. In particular, BAH’s base
experts belong to a special strategy investing on a single asset, and thus the number
of experts equals that of assets. In other words, BAH buys individual stocks, lets
the stocks go, and finally pools their individual wealth. On the other hand, the base
experts in the UP-type algorithms can be any single strategy class that invests over the
whole markets. Besides, UP-type algorithms are also similar to the meta-algorithms
(MA) in Chapter 7, while the latter applies to base experts of multiple classes.
Cover (1991) proposed the universal portfolio strategy, and Cover and Ordentlich
(1996) further refined the algorithm as μ-weighted universal portfolio, in which μ
denotes a given distribution on the space of valid portfolio
m
. Intuitively, Covers
UP operates similar to a fund of funds (FOF),
and its main idea is to buy and hold
parameterized CRP strategies over the whole simplex domain. In particular, it initially
invests a proportion of wealth dμ(b) to each portfolio manager operating CRPstrategy
with b
m
, and lets the CRP managers run. Then, at the end, each manager will
grow his wealth to S
n
(b)dμ(b). Finally, Covers UP pools the individual experts’
An FOF holds a portfolio of other investment funds, rather than directly investing in stocks, futures,
etc.
23
T&F Cat #K23731 — K23731_C004 — page 23 — 9/28/2015 — 21:07
24 FOLLOW THE WINNER
wealth over the continuum of portfolio strategies. Note that S
n
(b) = e
nW
n
(b)
, which
means that the portfolio grows at an exponential rate of W
n
(b).
Formally, its update scheme (Cover and Ordentlich 1996, Definition 1) can
be interpreted as a historical performance-weighted average of all valid constant
rebalanced portfolios,
b
t+1
=
m
bS
t
(b)dμ(b)
m
S
t
(b)dμ(b)
.
Note that at the beginning of period t +1, one CRP managers wealth (historical
performance) equals S
t
(b)dμ(b). Incorporating the initial wealth of S
0
= 1, the final
cumulative wealth is the weighted average of CRP managers’ wealth (Cover and
Ordentlich 1996, Eq. (24)):
S
n
(UP) =
m
S
n
(b)dμ(b). (4.1)
One special case is that μ equals a uniform distribution; the portfolio update reduces
to Covers UP (Cover 1991, Eq. (1.3)). Another special cases is Dirichlet
1
2
,...,
1
2
weighted UP (Cover and Ordentlich 1996), which is proved to be a more optimal
allocation.
Alternatively, if a loss function is defined as the negative logarithmic function
of portfolio return, Covers UP is actually an exponentially weighted average fore-
caster (Cesa-Bianchi and Lugosi 2006). The regret (Cover 1991) achieved by Covers
UP is O(m log n), and its time complexity is O(n
m
), where m denotes the number
of stocks and n refers to the number of periods. Cover and Ordentlich (1996) proved
that the
1
2
,...,
1
2
weighted UP has the same scale of regret bound, but a better
constant term (Cover and Ordentlich 1996, Theorem 2).
As Cover’s UP is based on an ideal market model, one research direction is to
extend the algorithm to handle various realistic assumptions. Cover and Ordentlich
(1996) considered side information, including experts’ opinions and fundamental
data. Cover and Ordentlich (1998) extended the algorithm to handle short selling and
margin, and Blum and Kalai (1999) took account of transaction costs.
Another research direction is to generalize Covers UP with different base classes,
rather than the CRP strategy. Jamshidian (1992) generalized the algorithm for con-
tinuous time markets and presented its long-term performance. Vovk and Watkins
(1998) applied the aggregating algorithm (AA) (Vovk 1990) to a finite number of
arbitrary investment strategies, of which Covers UP becomes a specialized case
when applied to an infinite number of CRPs. Ordentlich and Cover (1998) analyzed
the minimal ratio of final wealth achieved by any nonanticipating investment strategy
to that of BCRP and presented a strategy to achieve such an optimal ratio. Cross and
Barron (2003) generalized Covers UP from the CRP strategy class to any parameter-
ized target class and proposed a computation favorable universal strategy. Akcoglu
et al. (2005) extended Covers UP from a parameterized CRP class to a wide class
of investment strategies, including trading strategies operating on a single stock and
portfolio strategies operating on the whole stock market. Kozat and Singer (2011)
proposed a similar universal algorithm based on the class of semiconstant rebalanced
T&F Cat #K23731 — K23731_C004 — page 24 — 9/28/2015 — 21:07
EXPONENTIAL GRADIENT 25
portfolios (Helmbold et al. 1998), which provides good asymptotic performance in
case of nonzero transaction costs.
Besides our intuitive analysis, various works have been proposed to discuss the
connection between Covers UP with universal prediction (Feder et al. 1992), data
compression (Rissanen 1983), and Markowitz’s mean variance theory (Markowitz
1952). Algoet (1992) discussed a universal scheme for prediction, gambling, and
portfolio selection. Cover (1996) and Ordentlich (1996) discussed the connection
between UP selection and data compression. Belentepe (2005) presented a statistical
view of Covers UP strategy and claimed its approximate equivalence to the mean
variance portfolio theory.
Although Covers UP has a tight regret bound, its implementation is exponential
to the number of stocks, which restricts its practical applicability. To handle the
computational issue, Kalai and Vempala (2002) presented an efficient implementation
based on rapidly mixing nonuniform random walks, improving the running time from
original O(n
m
) to O(m
7
n
8
).
4.2 Exponential Gradient
Algorithms in the exponential gradient type (EG type) focus on the following
optimization formulation:
b
t+1
= arg max
b
m
η log b ·x
t
R(b, b
t
), (4.2)
where R(b, b
t
) denotes a regularization term and η > 0 denotes a learning rate. One
straightforward interpretation is to track the best stock in the last period while keeping
previous portfolio information via a regularization term.
Helmbold et al. (1998) proposed the EG strategy, which is based on the same
algorithm for mixture estimation (Helmbold et al. 1997). Following Equation 4.2,
EG adopts relative entropy as its regularization term, that is,
R(b, b
t
) =
m
i=1
b
i
log
b
i
b
t,i
.
EG’s formulation is convex in b; however, it is hard to solve since the log function is
nonlinear. Thus, the authors adopted log’s first-order Taylor expansion at b
t
, that is,
log b·x
t
log(b
t
·x
t
) +
x
t
b
t
·x
t
(b b
t
).
Then the nonlinear log term becomes linear and the optimization is easy to solve.
Solving the optimization, we can obtain EG’s update rule as
b
t+1,i
= b
t,i
exp
η
x
t,i
b
t
·x
t
Z, i = 1,...,m,
where Z denotes the normalization term such that the portfolio weights sum to 1.
T&F Cat #K23731 — K23731_C004 — page 25 — 9/28/2015 — 21:07
26 FOLLOW THE WINNER
Besides the multiplicative update rule (EG), the optimization problem can also
be solved using the gradient projection (GP) and expectation–maximization (EM)
(Helmbold et al. 1997). Rather than EG’s relative entropy, GP adopts an L2-norm
regularization, and EM adopts an χ
2
regularization term, that is,
R(b, b
t
) =
1
2
m
i=1
(b
i
b
t,i
)
2
GP
1
2
m
i=1
(b
i
b
t,i
)
2
b
t,i
EM
.
Solving corresponding optimization problems, we can obtain GP’s update rule as
b
t+1,i
= b
t,i
+η
x
t,i
b
t
·x
t
1
m
m
j=1
x
t,j
b
t
·x
t
,
and EM’s update rule as
b
t+1,i
= b
t,i
η
x
t,i
b
t
·x
t
1
+1
.
The latter can also be viewed as EG’s first-order approximation.
One key parameter for the EG-type algorithms is the learning rate η. To achieve a
universal regret bound, η has to be small. However, as η 0, its update approaches
uniform,
which degrades to UCRP. Such an analysis will be empirically verified in
Section 13.6.
EG has a regret bound of O(
n log m) and a running time of O(mn). The regret
is not as tight as Cover’s UP; however, its linear time substantially surpasses that of
UP. Besides, the authors also proposed a variant by transforming all price relatives,
which has a tight regret bound of O(m log n). Though not officially proposed for
online portfolio selection (Helmbold et al. 1997), GP can straightforwardly achieve
a regret of O(
mn), which is significantly worse than EG.
Das and Banerjee (2011) generalized EG-type algorithms to an MA named online
gradient updates, which combine underlying experts such that the overall system
performs no worse than any convex combination of its base experts.
4.3 Follow the Leader
FTL strategies directly track the BCRP till time t:
b
t+1
= b
t
= arg max
b
m
t
τ=1
log(b ·x
τ
). (4.3)
Intuitively, this category follows the BCRP leader over the known periods, and the
ultimate leader is BCRP over the whole periods.
In case of η = 0, b
t+1,i
= b
t,i
=···=b
1,i
=
1
m
.
T&F Cat #K23731 — K23731_C004 — page 26 — 9/28/2015 — 21:07
FOLLOW THE REGULARIZED LEADER 27
Ordentlich (1996, Chapter 4.4) briefly mentioned a strategy to obtain portfolios
by mixing BCRP to date and uniform portfolio:
b
t+1
=
t
t +1
b
t
+
1
t +1
1
m
1.
However, its worst-case regret bound is worse than that of Covers UP.
Gaivoronski and Stella (2000) proposed successive constant rebalanced port-
folios (SCRPs) and weighted successive constant rebalanced portfolios (WSCRPs)
for stationary markets. For each period, SCRP directly adopts the BCRP to date,
b
t+1
= b
t
. The authors further solved the optimal portfolio b
t
via stochastic opti-
mization (Birge and Louveaux 1997), resulting in the updates (Gaivoronski and Stella
2000, Algorithm 1). On the other hand, WSCRP outputs a convex combination of the
SCRP and previous portfolio:
b
t+1
= (1 γ)b
t
+γb
t
,
where γ ∈[0, 1] represents a trade-off parameter. The regret bounds achieved by
SCRP and WSCRP are O(mlog n), which is the same as that of Covers UP.
Rather thanassuminga stationary market, somealgorithmsin this categoryassume
that the historical market is nonstationary. Gaivoronski and Stella (2000) proposed
variable rebalanced portfolios (VRP), which calculate the BCRP on a latest sliding
window. To be specific, VRP updates the portfolio as
b
t+1
= arg max
b
m
t
τ=tW +1
log(b ·x
τ
),
where W denotes a specified window size.
Gaivoronski and Stella (2003) further proposed adaptive portfolio selection
(APS). By changing the objective, APS can handle three portfolio selection tasks,
that is, adaptive Markowitz portfolio, log-optimal constant rebalanced portfolio, and
index tracking. To handle the transaction cost issue, they further proposed threshold
portfolio selection, which only rebalances the portfolio if the expected return of a new
portfolio exceeds that of the last portfolio by a threshold.
4.4 Follow the Regularized Leader
The FTRL approach adds a regularization term to Equation 4.3:
b
t+1
= arg max
b
m
t
τ=1
log(b ·x
τ
)
β
2
R(b), (4.4)
where β denotes a trade-off parameter and R(b) is the regularization term on b. Note
that the first term includes all historical information; thus, the regularization term only
relates to the next portfolio, which is different from the EG algorithm. One typical
regularization is L2-norm, that is, R(b) =b
2
.
T&F Cat #K23731 — K23731_C004 — page 27 — 9/28/2015 — 21:07
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset