• Search in book...
• Toggle Font Controls
Chapter 8
Correlation-Driven Nonparametric
Learning
As described in Part II, several approaches have been proposed to select portfolios
from ﬁnancial markets. The pattern matching–based approach, which is intuitive in
nature, can achieve best performance at the present time. However, one key chal-
lenge to this approach is to effectively locate a set of trading days whose price
relative vectors are similar to the coming one. As detailed in Section 6.1, existing
strategies often adopt Euclidean distance to measure the similarity between two pre-
ceding market windows. Euclidean distance can somehow measure the similarity;
however, it simply considers the neighborhood of the latest market windows and
ignores the linear or nonlinear relationship between two market windows, which is
important for price relative estimation. In this chapter, we propose to exploit similar
patterns via a correlation coefﬁcient, which effectively measures the linear relation-
ship, and further propose a novel pattern matching–based online portfolio selection
algorithm “CORrelation-driven Nonparametric learning” (CORN) (Li et al. 2011a).
The proposed CORN algorithm can better locate a similarity set, and thus can output
portfolios that are more effective than existing pattern matching–based strategies.
Moreover, we also proved CORN’s universal consistency,
which is a nice property
for the pattern matching–based algorithms. Further, in Part IV, we will extensively
evaluate the algorithm on several real stock markets, where the encouraging results
show that the proposed algorithm can easily beat both market index and best stock
substantially (without or with small transaction costs) and also surpass a variety of
the state-of-the-art techniques signiﬁcantly.
This chapter is organized as follows. Section 8.1 motivates the proposed correla-
tion metric for selecting similarity sets. Section 8.2 details the ideas of the proposed
online portfolio selection algorithm, and then Section 8.3 illustrates the proposed
algorithms. Section 8.4 proves CORN’s universal consistency and further analyzes
the proposed algorithms. Finally, Section 8.5 summarizes this chapter and indicates
future directions.
This property is missing in Li et al. (2011a).
47
T&F Cat #K23731 — K23731_C008 — page 47 — 9/28/2015 — 21:18
48 CORRELATION-DRIVEN NONPARAMETRIC LEARNING
8.1 Preliminaries
8.1.1 Motivation
One main idea of existing approaches is to optimize portfolios by mining similar
patterns and information from historical market sequences. Anticor (Borodin et al.
2004) attempts to ﬁnd statistical relations between pairs of stocks, such as posi-
tive auto covariance and negative cross-covariance, while pattern matching–based
strategies (Györﬁ et al. 2006, 2008) try to discover similar appearances among histor-
ical markets. Though successful in mining statistical relations among stocks, Anticor
ignores market movements, which are crucial for a portfolio selection task. Moreover,
Anticor is heuristic in nature, which could lead to suboptimal solutions. On the other
hand, existing pattern matching–based strategies (Györﬁ et al. 2006, 2008) rely on
Euclidean distance to measure the similarity between two market windows. Though
their empirical performance is excellent, the Euclidean distance cannot exploit the
directional information between the two market windows. Therefore, it may detect
some useful price relatives, but often includes some potentially useless or even harm-
ful price relatives and excludes many beneﬁcial price relatives. Such a similarity set
will ﬁnally weaken the following portfolio optimization step, resulting in less effective
portfolios.
To better understand the drawbacks of Euclidean distance in measuring the simi-
larity between two market windows, we give a motivating example in Figure 8.1. Let
us assume that all market windows consist of two price relatives, such as a market of
one asset and the window size is two, or a market with two assets and the window size
equals one. Let the latest market window for the t-th period be x
t1
t2
= (1.10, 1.20).
Clearly, x
t1
t2
shows an increasing trend, and we aim to locate similar market windows
that also show increasing trends. Suppose we have three possible pairs of market win-
dows: A1: (0.90, 0.80), A2: (0.80, 0.90); B1: (1.2, 1.1), B2: (1.1, 1.2); C1: (1.4, 1.3),
C2: (1.3, 1.4). Note that in a long-only portfolio, relative trends, rather than absolute
trends, determine the allocations of capital.
For example, although A2 contains two
decreasing price relatives (both 0.90 and 0.80 are less than 1), the market sequence is
relatively increasing (0.90 > 0.80). In case that the vectors contain two assets, for the
recent market window x
t1
t2
, it is better to allocate more capital on the second asset
(1.20 > 1.10), which is also the case in A2. However, this is not the case in B1 or C1,
though their absolute price relatives are all increasing.
Among the three pairs, A2,
B2, and C2 show increasing trends, while A1, B1, and C1 show decreasing trends.
Thus, a good similarity measure should classify A2, B2, and C2 as similar appear-
ances, which will beneﬁt the next step, and A1, B1, and C1 as dissimilar appearances,
which will harm the subsequent portfolio optimization step.
Now let us classify these market sequences via a Euclidean distance measure with
that is,
x
i1
i2
x
t1
t2
0.2. According to Figure 8.1c, a Euclidean
In our problem setting, there are no cash or risk-free assets. In reality, a weaker constraint (e.g., at
most, 90% of capital can be put in assets), may appear in mutual funds.
Because their ﬁrst asset is more favorable than the second one, which is different from the latest x
t1
t2
.
The radius is arbitrarily chosen to to limit the number of selected price relatives.
T&F Cat #K23731 — K23731_C008 — page 48 — 9/28/2015 — 21:18
PRELIMINARIES 49
Periods
Price relatives
0.80
1.00
1.20
1.40
X
t–1
t–2
i –2
i –1
t –2 t –1
...
A1
(a)
A2
B1
B2
C1
C2
Sequence Price relatives
x
t1
t2
(1.1, 1.2)
A1 (0.9, 0.8)
A2 (0.8, 0.9)
B1 (1.2, 1.1)
B2 (1.1, 1.2)
C1 (1.4, 1.3)
C2 (1.3, 1.4)
(b)
x
t1
t2
A1 A2 B1 B2 C1 C2
Euclidean distances 0.45 0.42 0.14 0 0.32 0.28
Similar? (Y/N) N N Y Y N N
Correlation coefﬁcients 1 1
1 1 1
1
Similar? (Y/N)
N Y N Y N Y
(c)
Figure 8.1 A motivating example to illustrate the limitation of Euclidean distance.
(a) Market windows A1, A2, B1, B2, C1, C2, and x
t1
t2
, each of which contains two price
relatives. (b) The price relative vectors. (c) The market windows via Euclidean distance and
correlation coefﬁcient.
measure will classify B1 and B2 to the similarity set, since they are both located
within the Euclidean ball of x
t1
t2
(with a radius of 0.2). Such a classiﬁcation is
clearly suboptimal, as it includes harmful B1 and excludes beneﬁcial A2 and C2. As a
consequence of the imperfect similarity set, the subsequent portfolio optimization
will considerably suffer from irrelevant or even harmful market windows (such as
market window B1) and the neglect of beneﬁcial market windows (such as market
T&F Cat #K23731 — K23731_C008 — page 49 — 9/28/2015 — 21:18
50 CORRELATION-DRIVEN NONPARAMETRIC LEARNING
windows A2 and C2). This motivates us to overcome the limitation by exploring a
more effective similarity measure.
8.2 Formulations
The proposed algorithm is mainly inspired by the idea of exploiting statistical correla-
tions between two market windows, and also driven by the consideration of exploring
the powerful nonparametric learning techniques to effectively optimize a portfolio.
Traditional portfolio selection methods in ﬁnance often try to estimate a target
function based on past data and build portfolios based on the learned function. How-
ever, since the ﬁnancial market is complex and accurate modeling of its movements
is a difﬁcult task, we adopt a nonparametric learning approach (or instance-based
learning, or case-based learning) (Aha 1991; Aha et al. 1991; Cherkassky and Mulier
1998). Nonparametric learning makes no assumptions on data distribution (or mar-
ket distribution), and it captures the knowledge from stored training data without
building any target functions. In particular, at the beginning of every period, the pro-
posed algorithm locates similar price relatives among all past price relatives, and then
maximizes the expected multiplicative portfolio return directly based on the similar
appearances. Without estimating any global functions of the market movements, the
proposed algorithm estimates a target value of next price relative.
To overcome the limitation of Euclidean distance in mining historical market
windows and the negligence of whole-market movements in all existing strategies,
we propose to employ the Pearson product–moment correlation coefﬁcient, which is
an effective tool for measuring statistical linear relationships. Note that it measures
the statistical correlations between market windows of all assets, rather than pairs
of assets as Anticor does. Since market windows of all assets represent the whole-
market movements in a period, they could be more effective to match the similar price
relatives regarding the whole market.
Till now, we declare a correlation-similar set that contains historical trading days
whose previous market windows are statistically correlated to the latest one, and
formally deﬁne it as
C
t
(w, ρ) =
w<i<t+1
cov
x
i1
iw
, x
t
tw+1
std
x
i1
iw
std
x
t
tw+1
ρ
!
,
where w denotes the window size, 1 ρ 1 is a correlation coefﬁcient threshold,
cov(A, B) denotes the covariance between market windows A and B, and std(A)
denotes the standard deviation of market window A. If either std term equals 0, that
is, the market is of zero volatility in a speciﬁc window, we simply set the correlation
coefﬁcient to 0. In the above calculation, both matrix x
i1
iw
and x
t
tw+1
are concate-
nated into m ×w-dimensional vectors, and we can obtain the univariate correlation
coefﬁcient between the two market windows.
The correlation coefﬁcient measure distinguishes the proposed algorithm from
previous nonparametric learning strategies, which measure the similarity via
Euclidean distance. First, Euclidean distance only considers the magnitude between
T&F Cat #K23731 — K23731_C008 — page 50 — 9/28/2015 — 21:18
ALGORITHMS 51
two market windows, while the proposed correlation coefﬁcient measures their linear
similarity, in both magnitude and direction. On the one hand, it considers the direction.
For example, ρ
1
= 0.8 and ρ
2
=−0.8 intuitively correspond to equivalent magni-
tudes of linear dependence or similarity; however, they are in opposite directions, that
is, the ﬁrst market window is in the same trend with the latest market window, and the
other is opposite. On the other hand, it also considers the magnitude. For example,
ρ
1
= 0.8 and ρ
2
= 0.2 clearly indicate that the ﬁrst market window is more suitable
than the second one (ρ
1
> ρ
2
). Thus, the correlation coefﬁcient measure considers not
only magnitude but also direction, which are appropriately balanced. With such lin-
ear dependence, we can better identify similar price relatives, thus leading to superior
performance. Euclidean distance may also be used to measure directional information
indirectly, for example, by using the slope of two centralized points. However, such
a method only measures the directional information but ignores their magnitude.
Second, to calculate the univariate correlation coefﬁcient, we will calculate the
arithmetic mean of both m ×w-dimensional vectors. This mean return is uniformly
distributed among m assets over w periods, which is in essence the market strategy.
As a result, the mean return actually reﬂects the whole-market movements during the
window.Thecorrelationcoefﬁcient measures the lineardependencybetweentwo mar-
ket windows, whose means represent the whole-market movements. This, therefore,
distinguishes the proposed strategy from Anticor strategy and existing nonparametric
learning strategies, which ignore the whole market.
Now let us return to the preceding motivating example and select a similarity
set via correlation coefﬁcient metric, with a threshold of 0. Figure 8.1(c) clearly
shows that the metric can correctly classify these market windows, whose results are
identical to our intuitive analysis. In particular,A2, B2, and C2 are classiﬁed as similar,
and A1, B1, and C1 are classiﬁed as dissimilar. Note that our example is extremely
straightforward and thus results in extreme values (either +1or1), which is not
always the case.
8.3 Algorithms
Next, we present the proposed CORN algorithm, which exploits the correlation-
similar set in optimizing portfolios for actively rebalancing.
We start by deﬁning a set of W ×P experts, each expert indexed by (w, ρ), that is,
{E(w, ρ) : w 1, 1 ρ 1},
and W represents the maximum window size and P represents the number of corre-
lation coefﬁcient thresholds. Each expert E(w, ρ) represents a CORN expert learning
algorithm and outputs a portfolio, denoted as E(w, ρ) = b(w, ρ).
As summarized in Algorithm 8.1, a CORN expert learning algorithm consists of
two major steps. The ﬁrst step, as illustrated in Section 8.2, is to locate a correlation-
similar set via the correlation coefﬁcient metric, and the second step is to obtain an
optimized portfolio that can maximize the expected return, which is the main target
of our research. After calculating the correlation-similar set C
t
(w, ρ) at the end of
T&F Cat #K23731 — K23731_C008 — page 51 — 9/28/2015 — 21:18
• No Comment
..................Content has been hidden....................