Discussion of algorithms in backtesting

After taking into consideration the designing of a backtesting model, one or more algorithms may be used to improve the model on a continuous basis. This section briefly covers some of the algorithmic techniques used in areas of backtesting, such as data mining and machine learning.

K-means clustering

The k-means clustering algorithm is a method of clustering analysis in data mining. From the backtest results of n observations, the k-means algorithm is designed to classify the data into k clusters based on their relative distance from each other. The center point of each cluster is computed. The objective then is to find the within-cluster sum of squares that gives us a model averaged point. The model averaged point indicates the likely average performance of the model, which can be used for further comparison with the performance of other models.

K-nearest neighbor machine learning algorithm

The k-nearest neighbor (KNN) is a lazy learning technique that does not build any models.

An initial set of backtest model parameters are chosen either by random or best guess.

After analyzing the results of the model, a k number of sets of parameters that is closest to the original set are used for computation in the next step. The model will then take the set of parameters that gives the best results.

The process continues until the terminating condition is reached, thereby always giving the best set of model parameters available.

Classification and regression tree analysis

The classification and regression tree (CART) analysis contains two decision trees that are used in data mining. The classification tree uses classification rules to classify the outcomes of a model using nodes and branches in the decision tree. The regression tree attempts to assign a real value to the classified outcome. The resulting values are averaged to provide a measure of the quality of the decision.

The 2k factorial design

When designing experiments for backtesting, we can consider the use of 2k factorial design. Suppose we have two factors, A and B. Each factor behaves like a Boolean value, where values of either +1 or -1. A +1 value indicates a quantitatively high value, while -1 indicates a low value. This gives us a combination of The 2k factorial design outcomes. For a 3-factor model, this gives us a combination of The 2k factorial design outcomes. The following table illustrates an example with two factors with outcomes W, X, Y, and Z:

 

A

B

Replication I

Value

+1

+1

W

Value

+1

-1

X

Value

-1

+1

Y

Value

-1

-1

Z

Note that we are generating one replication of backtest to produce a set of outcomes. Performing additional replications gives us more information. From this data, we can perform a regression and analyze its variance. The objectives of these tests are to determine which factors, A or B, are more influential over another, and what values should be chosen so that the outcomes are either near some desired value, able to achieve a low variance, or minimize the effects of uncontrollable variables.

The genetic algorithm

The genetic algorithm (GA) is a technique where every individual evolves itself through the process of natural selection to optimize a problem. A population of candidate solutions in an optimization problem goes through an iterative process of selection to become parents, undergoing mutation and crossover to produce the next generation of offsprings. Over cycles of successive generations, the population evolves toward an optimal solution.

The application of genetic algorithms can be applied to a variety of optimizing problems, including backtesting, and is especially useful for solving standard optimizations, discontinuous or non-differentiable problems, or nonlinear outcomes.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.110.155