• Search in book...
• Toggle Font Controls
Chapter 2
Problem Formulation
This chapter introduces the problem setting of online portfolio selection (OLPS)
and formally formulates the problem mathematically as a sequential decision task.
We further relax the problem setting by adding two practical constraints: transaction
costs and margin buying. Finally, we introduce the idea of how to evaluate a strategy’s
performance.
Speciﬁcally, this chapter is organized as follows. Section 2.1 formally formulates
the OLPS task as a sequential decision problem. Section 2.2 relaxes the transaction
costs and margin buying constraints. Section 2.3 introduces several evaluation metrics
for the task. Finally, Section 2.4 summarizes this chapter.
2.1 Problem Settings
Consider an OLPS task: assume an investor aims to invest his capital on a ﬁnite
number of m 2 investment assets
for a ﬁnite number of n 1 trading periods.
At the t-th period (t = 1,...,n), the asset (close) prices are represented by a
vector p
t
R
m
+
, and each element p
t,i
,i = 1,...,m, represents the close price of
asset i. Their price changes are represented by a price relative vector x
t
R
m
+
, each
component of which denotes the ratio of the t-th close price to the last close price,
that is, x
t,i
=
p
t,i
p
t1,i
.
Thus, an investment in asset i throughout period t increases
by a factor of x
t,i
.
§
Let us denote x
n
1
={x
1
,...,x
n
} as a sequence of price relative
vectors for n periods, and x
e
s
={x
s
,...,x
e
}, 1 s<e n as a market window of
price relative vectors ranging from period s to period e.
An investment in the market for the t-th period is speciﬁed by a portfolio vector
b
t
= (b
t,1
,...,b
t,m
), where b
t,i
,i = 1,...,m, represents the proportion of wealth
invested in asset i at the beginning of the t-th period. Typically, the portfolio is
When m = 1, the problem is reduced to single-stock trading, which is out of the scope of this book.
A period can be a week, a day, an hour, or even a second in high-frequency trading.
Here we adopt simple gross return, while one may choose simple net return, i.e.,
p
t,i
p
t1,i
p
t1,i
.
For the calculation of the ﬁrst period, suppose we have p
0,i
.
§
For example, x
t,i
= 2 means that the investment on an asset will increase by 100%, or double its
initial investment. x
t,i
= 1 means that the capital will remain its initial capital.
11
T&F Cat #K23731 — K23731_C002 — page 11 — 9/28/2015 — 21:06
12 PROBLEM FORMULATION
self-ﬁnanced and no margin/short is allowed; therefore, each entry of a portfolio
is nonnegative and adds up to one, that is, b
t
m
, where
m
=
b
t
: b
t
0,
m
i=1
b
t,i
= 1
.
The investment procedure is represented by a portfolio strategy,
that is, b
1
=
1
m
,...,
1
m
, and the following sequence of mappings:
b
t
: R
m(t1)
+
m
,t = 2, 3,...,
where b
t
= b
t
x
t1
1
is the portfolio determined at the beginning of the t-th period
upon observing past market behaviors. We denote by b
n
1
={b
1
,...,b
n
} the strategy
for n periods, which is the output of an OLPS strategy.
At the t-th period, a portfolio b
t
produces a portfolio period return s
t
, that is, the
wealth increases by a factor of s
t
= b
t
x
t
=
m
i=1
b
t,i
x
t,i
.
Since we reinvest and
adopt relative prices, the wealth would grow multiplicatively. Thus, after n periods,
a portfolio strategy b
n
1
will produce a portfolio cumulative wealth of S
n
, which
increases the initial wealth by a factor of
n
t=1
b
t
x
t
, that is,
S
n
(b
n
1
, x
n
1
) = S
0
n
t=1
b
t
x
t
,
where S
0
denotes initial wealth and is usually set to \$1 for convenience.
We present the framework of the above task in Protocol 2.1. In this task, a portfolio
managers goal is to produce a portfolio strategy (b
n
1
) upon the market price relatives
(x
n
1
), aiming to achieve certain targets. The manager computes the portfolios in a
sequential manner. For each period t, the manager has access to the sequence of
past price relative vectors x
t1
1
. The manager computes a new portfolio b
t
for next
price relative vector x
t
, where the decision criterion varies among different managers.
Then traders will rebalance to the new portfolio, via buying and selling the underlying
stocks.At the end of a trading day, the market will reveal x
t
. The resulting portfolio b
t
is scored based on portfolio period return s
t
. This procedure is repeated until the ﬁnal
period, and the portfolio strategy is scored by its portfolio cumulative wealth S
n
.
It is important to note that we make several general and common assumptions in
the above model:
1. Transaction cost: no explicit or implicit transaction costs
exist.
2. Market liquidity: one can buy and sell the required amount, even fractional, at the
last close price of any given trading period.
3. Market impact: any portfolio selection strategy shall not inﬂuence the market or
any other stocks’ prices.
0 denotes that each element of the vector is nonnegative.
For example, Bin buys MSFT with 50% of his capital (\$5000), GS with 30% of his capital (\$3000),
and a T-bill with the remaining 20% (\$2000). If MSFT goes up by a factor of 2, GS goes down by a
factor of 0.5, and the T-bill remains 1. Then his capital will increase by a factor of 0.5 ×2 +0.3 ×0.5 +
0.2 ×1 = 1.35, or increase by 35%.
Explicit costs include commissions, taxes, stamp duties, and fees. Implicit costs include the bid–ask
spread, opportunity costs, and slippage costs.
T&F Cat #K23731 — K23731_C002 — page 12 — 9/28/2015 — 21:06
TRANSACTION COSTS AND MARGIN BUYING MODELS 13
Protocol 2.1: Online portfolio selection.
Input: x
n
1
: Historical market price relative sequence
Output: S
n
: Final cumulative wealth
Initialize S
0
= 1, b
1
=
1
m
,...,
1
m
for t = 1, 2,...,ndo
Portfolio manager learns a portfolio b
t
;
Market reveals a price relative vector x
t
;
Portfolio incurs period return s
t
= b
t
x
t
S
t
= S
t1
×
b
t
x
t
;
Portfolio manager updates his or her decision rules;
end
The preceding assumptions are nontrivial. We will further analyze and discuss their
implications and effects for our empirical studies in Sections 13.4 and 14.1.
Finally, as we are going to design intelligent learning algorithms that ﬁt the above
model, let us ﬁx the objective of the proposed learning algorithms. For a portfolio
Sharpe 1964) or to maximize cumulative return (Kelly 1956; Thorp 1971) at the end
of a period. While the model is online, which contains multiple periods, we choose
to maximize the cumulative return (Hakansson 1971),
which is also the objective of
most existing algorithmic studies.
2.2 Transaction Costs and Margin Buying Models
While our model is concise and simple to understand, it ignores some practical issues
in real trading scenarios. We now relax two constraints to address these issues.
In reality, an important and unavoidable issue is the transaction cost, which
includes the commission fees and taxes imposed by brokers and governments, dur-
ing the rebalance activities.
Note that the transaction cost is imposed by markets,
and a portfolio’s behavior cannot change the properties of transaction costs, such
as commission rates or tax rates. To handle the issue, the ﬁrst way, which is com-
monly adopted by existing strategies, is that a portfolio selection model does not
take transaction costs into account, and the second way is to directly integrate the
costs in the model (Györﬁ and Vajda 2008). In this book, we take the ﬁrst way and
adopt the simpliﬁed proportional transaction cost model (Blum and Kalai 1999;
Borodin et al. 2004).
To be speciﬁc, rebalancing a portfolio incurs transaction costs
However, such objective does not prohibit us from comparing different strategies via risk-adjusted
terms, such as the Sharpe ratio and Calmar ratio.
Besides commission and taxes, some other factors, such as bid–ask spreads, also implicitly incur
transaction costs to a portfolio.
Blum and Kalai (1999, p. 195) also provide the precise model, with which our algorithms’ per-
formance is almost the same as the simpliﬁed one, but the precise model costs much more time to
compute.
T&F Cat #K23731 — K23731_C002 — page 13 — 9/28/2015 — 21:06
14 PROBLEM FORMULATION
on every buy and sell operation, based upon a transaction cost rate of γ (0, 1).
At the beginning of period t, the portfolio manager rebalances his or her wealth to
a new portfolio b
t
, from last close price adjusted portfolio
ˆ
b
t1
, each component of
which is calculated as
ˆ
b
t1,i
=
b
t1,i
×x
t1,i
b
t1
x
t1
. Such rebalance incurs a transaction cost
of
γ
2
×
m
i=1
b
t,i
ˆ
b
t1,i
, where the initial portfolio is set to (0,...,0). Thus, the
cumulative wealth after n periods can be expressed as
S
γ
n
= S
0
n
t=1
(b
t
·x
t
) ×
1
γ
2
×
m
i=1
b
t,i
ˆ
b
t1,i

.
Another practical issue is margin buying, which allows the portfolio managers
to buy securities with cash borrowed from securities brokers, using their own equity
positions as collateral. Following existing studies (Cover 1991; Helmbold et al. 1998;
Agarwal et al. 2006), we relax this constraint and evaluate it empirically. We assume
the margin setting to be 50% down and a 50% loan,
at an annual interest rate
of 6% (equivalently, the corresponding daily interest rate of borrowing, c, is set to
0.000238). With such a setting, a new asset named “margin component” is generated
for each asset, and its price relative for period t equals 2 ×x
t,i
1 c. In the case
of x
t,i
1+c
2
, which means the stock drops more than half, we simple set its mar-
gin component to 0 (Li et al. 2012).
As a result, if margin buying is allowed, the
total number of assets becomes 2m. By adding such a “margin component,” we can
magnify both the potential proﬁt or loss on the i-th asset.
2.3 Evaluation
One standard criterion to evaluate an OLPS strategy is its portfolio cumulative wealth
at the end of trading periods. As we set the initial wealth, S
0
= 1 and thus S
n
also
denote the portfolio cumulative return, which is the ratio of ﬁnal portfolio cumulative
wealth divided by its initial wealth. Another equivalent criterion, which considers
compounding effect, is annualized percentage yield (APY), that is, APY =
y
S
n
1,
where y is the number of years corresponding to n periods.
§
APYmeasures the average
wealth increment that a strategy could achieve in a year. Typically, the higher the
portfolio cumulative wealth or annualized percentage yield, the better the strategy’s
performance is.
Besides the absolute return metrics, it is also important to evaluate a strategy’s
risk and risk-adjusted return (Sharpe 1963, 1994). One common criterion is the annu-
alized standard deviation of portfolio period returns to measure volatility risk and
That is, if one has \$100 stock (down or collateral) one can borrow at most \$100 cash (loan).
Such a measure is not perfect since it manually changes the margin component, although less than
5 per dataset. One may refer to Györﬁ et al. (2012, Chapter 4) for other solutions to the possibility of ruin.
For example, assume two assets with price relatives of (1.1, 0.9). After adjustment, the price relative
vector becomes (1.1, 0.9, 1.2, 0.8). Putting wealth on the latter two margin components, the portfolio’s
proﬁt or loss magniﬁes. That is, 10% proﬁt (1.1) becomes 20% (1.1 ×2 1 c) and 10% loss (0.9) also
becomes 20% (0.9 ×2 1 c). Note that the portfolio vector representing the proportions of capital is
still a simplex.
§
T&F Cat #K23731 — K23731_C002 — page 14 — 9/28/2015 — 21:06
EVALUATION 15
the annualized Sharpe ratio (SR) (Sharpe 1966) to evaluate volatility risk-adjusted
return. To obtain the annualized standard deviation, we calculate the standard devi-
ation of daily returns and multiply by
252.
calculate the annualized SR as
SR =
APYR
f
σ
p
,
where R
f
is the risk-free return
and σ
p
is the annualized standard deviation. The
higher the annualized SR, the better the strategy’s (volatility) risk-adjusted return is.
Portfolio management community often conducts drawdown analysis (Magdon-
Ismail and Atiya 2004) to measure the decline from a historical peak of portfolio
cumulative wealth. Formally, a strategy’s drawdown (DD) at period t is deﬁned as
DD(t) = sup[0, sup
i(0,t)
S
i
S
t
]. Its maximum drawdown (MDD) is the maximum
of drawdowns over all periods and can effectively measure a strategy’s downside
risk. Formally, maximum drawdown for a horizon of n, MDD(n), is deﬁned as
MDD(n) = sup
t(0,n)
[DD(t)].
Moreover, practitioners also adopt the Calmar ratio (CR) (Young 1991) to measure
CR =
APY
MDD
.
The smaller the maximum drawdown, the more drawdown risk the strategy can tol-
erate. The higher the Calmar ratio, the better (drawdown) risk-adjusted return the
strategy is.
To test whether simple luck can generate the return achieved by a strategy, portfo-
lio management practitioners (Grinold and Kahn 1999) can conduct statistical tests.
Since all test datasets are just samples of the market population, such tests can val-
idate a strategy for future. We conduct a Student’s t-test to determine the likelihood
that the observed proﬁtability is due to chance alone (under the assumption that a
strategy is not proﬁtable in the population). Since the sample proﬁtability is being
compared with no proﬁtability, 0 is subtracted from the sample mean proﬁt/loss. Note
that (daily) proﬁt/loss equals (daily) return minus 1. The standard error of mean is cal-
culated as the standard deviation divided by square root of the number of periods. The
t-statistic is the sample proﬁt mean
divided by the sample standard error. Finally,
the probability of the t-statistic can be calculated with a degree of freedom equal to the
number of periods minus 1. Note that the Student’s t-test assumes that the underlying
distribution of data is normal. According to the central limit theorem, as the sample
size increases, the distribution of the sample mean approaches normal. If a sample
Here, 252 denotes the average number of annual trading days. For other frequencies, we can choose
their corresponding numbers.
Typically, it equals the return of Treasury bills, and we ﬁx it at 4% per year, or 0.000159 per day.
Suppose we compare the sample proﬁt mean with 0.
T&F Cat #K23731 — K23731_C002 — page 15 — 9/28/2015 — 21:06
• No Comment
..................Content has been hidden....................