10
Backtest – Signal or Overfitting

By Peng Yan

Traditionally, an alpha is defined as the active return from an investment after risk adjustment is applied. In this chapter, alpha means a quantitative model to predict future investment returns.

BACKTEST

The start of an alpha design can be a hypothesis, a paper, a story, an inspiration, or just a random idea.

Ideas Need to be Tested

Similar to academic research, many assumptions are wrong, many trials futile. Only a few of them will be successful. We are human, and market participants are human as well. Humans are different as they have different ideas; only a small portion of those ideas may generate profits consistently in the real environment. At times you will have a strong belief that the model will work, yet after testing, it is proven not true or vice versa.

Asset class prices can be affected by many factors, either directly or indirectly. One idea may affect just one and neglect others.

Simulation and Backtest

We call the process of testing idea simulation. There are different simulation methods, such as:

  1. Monte Carlo simulation, which simulates the various sources of uncertainty that are affecting instrument values to get the range of resultant outcomes.
  2. Pricing model, which calculates asset price (Black-Scholes model is an example of a pricing model for options).
  3. Explanation model, which builds models to explain what happened in history.

In our working environment, simulation means backtest. That is, when there is an idea, we apply it with historical data to check the model’s performance. The assumption of backtest is: if the idea worked in history, then it is more likely to work in the future. By the same token, a model will not be considered if there are no historical simulation performances.

Backtest results are used for model pre-selection, comparison between different models, and judging alphas’ potential values. Backtest results include different measures such as Sharpe ratio, turnover, returns, correlation, etc.

Backtest is just one additional step once we have an idea. Good backtest performance is not sufficient for a profitable strategy. There are many other factors that will affect investment. As a general matter, one should not invest capital solely based on backtest simulation results. Some of the reasons are:

  1. Current market is not the same as historical period. Market rules can be changing, as well as investment participants, new theories, new technology.
  2. Simulation assumption may not be realistic. In order to get the assets (buy or sell), one may impact the market, and will need to pay transaction cost or commission. Reasonable estimation for those numbers is crucial when evaluating a simulation result.
  3. Possible forward-looking bias. If you saw someone following a trend and making a profit, you test a trend follow model, and perhaps you can get a good historical simulation. Without better understanding, you may or may not make profit in future investment.
  4. Overfitting. Sometimes one sees good simulation results that can be just random error or noise, but have no prediction power.

Overfitting is the topic of this chapter. The word overfitting comes from the statistical machine learning field and is critical in our backtest framework. The financial market is noisy, and even a very good model may have minimal positive prediction power. In an efficient market hypothesis, it is presumed there is no arbitrage opportunity to make a profit. When you see some good simulation results, you need to be careful when evaluating the overfitting risk of the models.

OVERFITTING

Multiple technologies have been proposed to reduce overfitting risks. For example, 10-fold crosses validation, regularization, and prior probability. Tenfold crosses validation is a process where you break the data into 10 sets of size n/10, train on 9 data sets and test on 1, then repeat 10 times and take the mean accuracy. Regularization, as in statistics and machine learning, is used for model selection to prevent overfitting by penalizing models with extreme parameter values. Prior probability is where an uncertain quantity p is the probability distribution that would express one’s uncertainty about p before some evidence is taken into account. Recently there have been some papers on the overfitting issues in the quantitative investment field, e.g. Bailey (2014a), Bailey (2014b), Beaudan (2013), Burns (2006), Harvey et al. (2014), Lopez de Prado (2013), Schorfheide and Wolpin (2012).

  • Overfitting is easy: After trying seven strategy configurations, a researcher is expected to identify at least one two-year long backtest with an annualized Sharpe ratio of over 1, whereas the expected out-of-sample Sharpe ratio is 0. If the researcher tries a large enough number of strategy configurations, a backtest can always be fitted to any desired performance for a fixed sample length. Thus, there is a minimum backtest length that should be required for a given number of trials.
  • Correlation can cheat: If we have a large number of random time series, the maximum correlation of a new random series can be larger than 0.2 with high probability, which should be zero because they are all random noise.
  • Financial markets have memory effect: Overfitting history will hurt out-of-sample performances. Trade noise will actually lose money in the real world, due to the transaction cost and market impact when you are trading a big size.
  • Academic papers have bias: Academic research papers don’t report how many times they tried, and don’t report their failures. Some models cannot be reproduced by others.
  • A higher acceptance rule is needed: Numbers of discovered models were increased in the last few years, and will continue to grow. We need a higher standard for alphas, especially for those cross-sectional prediction models.

HOW TO AVOID OVERFITTING

There are some guidelines to reduce the overfitting risk. Some of them are borrowed from the statistical/machine learning field.

  • Out-of-sample test: In order to test an alpha model, an out-of-sample test needs to be a true out-of-sample test. That is, we build a model, test it daily in a real environment, and monitor how it performs. It’s not right if: (1) models are backtested based on recent N years data, then use data of N years before as out of sample, or (2) take a part of instruments, and use the other part as out of sample. In case (1), the recent N years market contains information of older history, so models that worked recently may tend to work in history. In case (2), instruments are correlated – models with good performance in one universe tend to perform in another.

    Please note: when the number of out-of-sample alphas is increasing, the out-of-sample test may be biased as well. An alpha can perform randomly well due to luck. Out-of-sample performance on the single alpha level is inadequate.

  • Increase in sample requirement: Increasing Sharpe ratio, testing the model in a longer history, testing the model in a wider universe – are all helpful to reduce risk of overfitting. In the real world, unfortunately, there are constraints. There is not a systematic way of increasing Sharpe ratio. Either there is no available historical data that’s long enough, or the market has changed and history that is too old means nothing. There is no “wider universe” because it’s constrained by the number of instruments in the world.
  • Make the model elegant: Alpha is better if (1) it is simple and easy to understand; (2) it has a good theory or logic behind it, not just empirical discoveries; (3) it can be explained and you can tell the story behind it. For example, alpha = returns may have potential to be a good model, but alpha = returns + delta(volume) does not. The latter would not work because one cannot add two different units (i.e. returns uses dollars, while volume uses a whole number – such as shares).
  • Parameters and operations: Similar to machine learning models, with fewer parameters, models perform less sensitively to parameter change. This can help reduce overfitting risks. The value of spending time on fitting parameters is small.

These alternative methods can be useful:

  • Visualization: A graph contains more information than statistical numbers only.
  • Number of trials: With the same methodology, recording numbers of trials can be helpful for evaluating overfitting risk.
  • Artificial data: It is useful to test models on some artificial data sets.
  • Dynamic models:. Learning models dynamically is better than single time static learning.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.21.162.87