Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CHAPTER 9
Machine Learning and Event Detection for Trading Energy Futures

Peter Hafez and Francesco Lautizi

9.1 INTRODUCTION

The commodity futures spectrum is an integral part of today's financial markets. Specifically, energy‐related ones like crude oil, gasoline and natural gas, among many more, all react to the ebbs and flows of supply and demand. These commodities play a crucial role in everyday life, as they fuel most of the world's transportation systems and they are the input to businesses across all the industrial sectors, hence they are inherently linked to the economic cycle. Economic indicators such as gross domestic product and the unemployment rate to political upheaval and natural disasters, not to mention commodity‐specific issues like oil and gas pipeline disruptions or embargos, all contribute to the pricing of commodity futures (Table 9.1).

Table 9.1 Performance statistics

Source: RavenPack, January 2018.

Statistics	Out‐of‐sample
Statistics	Ensemble	High‐vol	Low‐vol
Annualized return	9.8%	21.3%	−3.0%
Annualized volatility	15.0%	16.9%	15.3%
Information ratio	0.65	1.27	−0.20
Hit ratio	51.1%	53.9%	47.5%
Max drawdown	38.3%	18.0%	62.2%
Per‐trade return (bps)	3.88	8.82	−1.97
Number of trades	2740	1929	811

The high‐vol and low‐vol strategies trade only during these regimes while the ensemble strategy trades irrespective of the regime. The out‐of‐sample period is January 2015 to December 2017.

In previous research, Brandt and Gao (2016) took a novel approach by constructing supply and demand sentiment indices, using RavenPack data, to model the price impact of geopolitical and macroeconomic events and sentiments on crude oil. In particular, they found that news about macroeconomic fundamentals had a predictive ability over a monthly horizon, while geopolitical events sizably affected the price, but without sign predictability in the short term.

Rather than relying on a single commodity strategy, we seek to build predictive models for a group of commodities by means of RPA's event detection capabilities. By utilizing RPA 1.0,¹ investors can benefit from the latest innovations in natural language processing (NLP) technology to identify the information that matters for commodities. With the latest release, the RavenPack event taxonomy has grown to more than 6800 event categories, allowing the swift and precise identification of market‐moving events across multiple asset classes and commodities. Events include supply increases, import/export guidance, inventory changes and more.

We select four commodity futures related to energy. We proceed to model the one‐day‐ahead volatility‐adjusted returns for the energy basket using an ensemble of machine learning techniques. Our results indicate that our mix of linear models performs well in terms of risk‐adjusted returns. However, including a wider spectrum of non‐linear models, e.g. artificial neural networks (ANNs) or gradient boosted trees regression, provides a way to improve performance and at the same time reduces the risk associated to model selection. Moreover, we demonstrate how return predictability at the basket level can be enhanced by conditioning on volatility regimes.

The study is organized as follows. Section 9.2 discloses the different data sources used, in particular how the input variables from RPA 1.0 are constructed. Section 9.3 describes the modelling framework, which is based on five machine learning algorithms. Section 9.4 compares the performance of the various models introduced in Section 9.3. Section 9.5 presents the general conclusions.

9.2 DATA DESCRIPTION

By using NLP techniques, RavenPack transforms large unstructured datasets, such as traditional news and social media, into structured and machine readable granular data and indicators that can be included in quantitative models, allowing investors to identify entities in the news and to link these to actionable events that are most likely to impact asset prices. Each event is further supported by various analytical measures, including sentiment, novelty (Event Similarity Days²), and relevance (Event Relevance³).

To create the strategies, we consider all commodity‐related news stories from RPA spanning a period of nearly 13 years, from January 2005 to December 2017.

Some mild restrictions are imposed on the dataset related to event detection and novelty. In particular, it is required that a news story can be matched with an event from the RavenPack event taxonomy. Furthermore, only events with Event Similarity Days (ESD) ≥ 1 are allowed in order to remove duplicated news events on an intraday basis. Contrary to recent in‐house research on equities (Hafez and Koefoed 2017a,b; Hafez and Guerrero‐Colón 2016; Hafez and Lautizi 2016), we do not condition on event relevance signal (ERS) as the tradeoff between stronger per‐event predictability and lower event frequency does not work in our favour herein. Instead, we use it directly in our feature construction, as detailed below in Section 9.3.1.⁴

By imposing these restrictions, we are limited to a subset of all available event categories in RPA. In particular, during our backtest, we find 110 unique event categories⁵ with at least one recorded event across our commodities universe. Crude oil is the most prevalent of all our commodities, with 103 unique event categories compared with the average commodity that has 34 categories.

Out of the 89 commodities covered by RavenPack, we select some of the most traded related to energy. The basket contains four commodities, with crude oil and natural gas being the most prominent. Table 9.2 provides an overview of the commodities in our study.

Table 9.2 Summary statistics for RavenPack Analytics

Commodity	Events	Days with events	Days with events (%)	Events per day
Commodity	Events	Days with events	Days with events (%)	Mean	25th percentile	Median	75th percentile	Max
Crude oil	316 959	4 721	99.4	67	22	53	94	480
Gasoline	48 838	4 407	92.8	11	3	7	15	147
Heating oil	6 865	3 085	65	2	1	2	3	46
Natural gas	64 932	4 594	96.8	14	6	12	19	99

Summary statistics for four energy commodities from January 2005 to December 2017. The numbers are based on the time‐shifted data, see Section 9.2.1.

Given the sheer number of event categories available in RPAs, the dimensionality of the matrix of independent variables becomes large – even with the restrictions mentioned above. Put differently, we are facing the well‐known curse of dimensionality (Donoho 2000), which renders traditional OLS regression impractical due to overfitting in the absence of feature selection. In previous research (Hafez and Koefoed 2017a,b), we have relied on OLS to model equity returns using RavenPack's Event Sentiment Score, but given the dimensionality confronting us, and the fact that there may be nonlinearities at play between the various event categories, we move beyond OLS and instead implement a batch of machine learning techniques as detailed in Section 9.3.

9.2.1 Price data

As part of the study, we use daily close‐to‐close commodity futures returns provided by Stevens Analytics.⁶ In addition, to modelling the next‐day (logarithmic) return, we use RavenPack data specific to each commodity up until 15 minutes before the settlement price of the given commodity futures. For example, the settlement price for crude oil futures is computed between 2:28 pm ET and 2:30 pm ET on CME Globex,⁷ meaning that we use RavenPack data available in the 24 hours up to 2:15 pm ET as input to our models.

Given that we are dealing with a basket of commodity futures with wildly varying volatilities – both across the spectrum and across time – we seek to volatility‐adjust the returns by a lagged rolling standard deviation. We do this to avoid over‐emphasizing those commodity futures with the highest volatility for each basket when estimating the models.

We define the log‐return, standard deviation and volatility‐adjusted log‐return as follows:

(9.1)

(9.2)

(9.3)

where n = 1, …, N represents the four commodity futures and t = 1, …, T is the time index identifying the day in which the price or return was observed, which can be missing for commodity‐specific non‐trading workdays. The parameter m defines the length of the window over which the standard deviation is calculated, while target defines the target volatility. Throughout this study, we use 21 trading days to calculate the standard deviation (m = 21) with an annualized target volatility of 20% . We have not optimized the parameter m, but we find that it provides a good tradeoff between stability and variability.

9.3 MODEL FRAMEWORK

To evaluate the RPAs suite, we utilize a string of machine learning techniques ranging from a linear model in the shape of elastic net regression to neural network, and tree‐based models. In total, we test five different models. To optimize the various hyper‐parameters of the models, we use ten‐fold cross validation (CV) and recalculate the results ten times to account for the inherent randomness in some of the models and the CV process. The five models are:

elastic net regression (ELNET) (Zou and Hastie 2005)
k‐nearest neighbour regression (KNN) (Altman 1992)
artificial neural network (ANN) (Hastie et al. 2009)
random forest (RF) (Breiman 2001)
gradient boosted trees with Gaussian loss function (GBN) (Friedman et al. 2000).

All five models use an additive error term, e, that is independent across time and commodity, but we do not make any assumptions about its data‐generating process:

(9.4)

where the functional form, f, depends on the model and x_t,n is a vector of independent variables. Note that the size of x_t,n varies over time depending on the number of event categories with enough news stories to be included. We describe the independent variables more thoroughly in Section 9.3.1.

We use a walk‐forward method whereby, on the first trading day of the year, we find the best hyper‐parameter settings for each of the five models using the previous ten years' worth of data. We then predict the year's daily volatility‐adjusted log‐returns (Eq. (9.3)) for each of the five models. In other words, we estimate the models for the period 2005–2014 and make daily predictions for 2015. We then step forward one year in time and carry out the estimation and daily prediction steps again.

To overcome the randomness problem, we repeat this procedure ten times – this will result in ten series of predictions per model. In our strategy we will use the average prediction over the ten runs for each of the models.

This procedure starts on 1 January 2015 and ends on 31 December 2017, representing our out‐of‐sample period.

9.3.1 Feature creation

All the models presented herein make use of the same input: as target variable the volatility‐scaled log‐returns and as features, a matrix of continuous variables. The features are designed to capture the impact of an event category taking place at time t for commodity n as well as its relevance and is constructed as follows:

(9.5)

(9.6)

where i = 1, …, I is the number of events for category j.

In other words, if we detect at least one news story for a given event category for date t and commodity n, the variable switches to from 0. This implies that news stories are weighted based on their ERS – thereby giving higher importance to news stories where the event is featured prominently, for example in the headline.

In order for a given event category to be included in the modelling of the commodities basket, we require at least 50 days with one or more events across the in‐sample dataset. This requirement is not optimized and is simply introduced to remove very infrequent event categories from consideration. Furthermore, we remove perfectly correlated independent variables as appropriate.

Finally, we perform feature selection based on the in‐sample data by requiring that there is an absolute correlation of at least 0.5% between each feature and our target variable. This results in a reduction of the number of features of between 37% and 45% depending on the out‐of‐sample year, meaning that we are left with 34–37 predictors. During preliminary research we found that imposing this restriction on the features improved speed and, importantly, in‐sample robustness of the five machine learning algorithms.⁸

9.4 PERFORMANCE

All strategies in our study use portfolio weights derived from the predicted returns of the models presented in Section 9.3. The sign of a predicted return determines the direction of the trade, while the relative size of the predicted returns determines the portfolio weight. The vector of predicted returns for a given day is normalized to ensure a gross exposure of 1. The net exposure, meanwhile, can range from −1 to 1. All reported results exclude transaction costs.

9.4.1 Model portfolios

Table 9.3 shows a set of in‐sample performance statistics⁹ for our five ML models across the commodities basket – resulting in five different portfolios.¹⁰ Overall, we find positive performance, with some rather big discrepancies across the models. In particular, we notice that the linear model is outperformed across all metrics by all the non‐linear ones, with the only exception of random forest (RF). In particular, the gradient boosted trees (GBN) and the ANN are the best models in‐sample, showing similar IRs of 2.40 and 2.39 respectively.

Table 9.3 In‐sample performance statistics.

Source: RavenPack, January 2018.

Statistics	ELNET	KNN	ANN	RF	GBN
Ann. return	17.6%	19.9%	34.6%	7.3%	37.3%
Ann. volatility	15.6%	14.3%	14.5%	15.1%	15.5%
IR	1.13	1.39	2.39	0.48	2.40
Hit ratio	53.5%	54.5%	57.8%	51.7%	56.0%
Max drawdown	22.4%	17.2%	16.34%	35.9%	17.9%

For each statistic, the bolded number is the best among the five models.

Having shown the in‐sample performance, in Table 9.4 we present the out‐of‐sample results. We find the RF model to be the best performer as it delivers an IR of 0.85 on an annualized return of 13.1%. The second‐best model is the KNN, which yields an IR of 0.83 and the lowest volatility (14.1%), followed by the GBN, which provides an IR of 0.76 and 12.3% annualized returns. The linear model (ELNET) is the second to last model, suggesting that modelling non‐linear relationships between our explanatory variable and commodities returns does indeed provide an edge that allows superior performance to be obtained.

Table 9.4 Out‐of‐sample performance statistics

Source: RavenPack, January 2018.

Statistics	ELNET	KNN	ANN	RF	GBN
Ann. return	8.5%	11.7%	4.0%	13.1%	12.3%
Ann. volatility	15.8%	14.1%	14.7%	15.4%	16.2%
IR	0.54	0.83	0.27	0.85	0.76
Hit ratio	51.4%	51.7%	50.6%	52.6%	51.3%
Max drawdown	36.2%	19.1%	45.1%	30.7%	23.6%

For each statistic, the bolded number is the best among the five models.

Moreover, it is noteworthy that had we chosen a model based on the in‐sample evidence, this would have resulted in suboptimal out‐of‐sample performance as the best two models in‐sample are respectively the third and the last one out‐of‐sample, an evidence that warns against the risk of model selection.

Considering that there is only a moderate positive correlation between the predicted returns for RF and most of the better‐performing models, such as KNN (0.55), we may be able to benefit from combining the predictions of the various models and in this way control for the risk associated with model selection, something that we will explore in Section 9.4.3.

9.4.2 Variable importance

Up until now we have answered only the question of whether our framework can produce alpha, not the question of which variables and in turn which event categories drive that alpha‐generating performance. We choose RF to answer this question as it (i) is the best‐performing model with an out‐of‐sample IR of 0.85, and (ii) provides an elegant way of computing and analyzing variable importance.

In order to detect the most important categories, we rely on a measure based on residual sum of squares, i.e. the total decrease in node impurities from splitting on the variable.

In Figure 9.1, we show the top‐ten categories. In order to obtain a measure of relative importance by year, we first compute the total decrease in node impurities by year. We then rescale the measure of importance by this amount, in order to provide a measure of the relative importance of each category in each out‐of‐sample year.

Comparative bar graph depicting Relative variable importance using ELNET for 2015 to 2017 for features including spill-commodity, commodity-price-loss, and commodity-price-gain. — **Figure 9.1** Relative variable importance using ELNET. Features are scaled by the sum of variable importance over all the variables in each year, thereby providing a relative importance interpretation for each out‐of‐sample year. For out‐of‐sample year 2015, the estimation window is 2005–2014, for out‐of‐sample year 2016, the estimation window is 2006–2015, and for out‐of‐sample year 2017, the estimation window is 2007–2016.

*Source:* RavenPack, January 2018.

The analysis of the most important variables provides a sensible picture. In particular, we find inventories‐related categories among the most important ones, for instance inventories‐down‐commodity. Inventories‐related news seems to play a relevant role in driving prices, as three out of the top ten variables belong to this event type.

Moreover, we also find supply‐related news to have a relevant role on price dynamics, as news related to resource discoveries, pointing to more supply of a commodity in the future, or spills, pointing to a decrease in the supply, also prove to be among the top predictors for future prices.

9.4.3 Ensemble portfolio

In Section 9.4.1 we demonstrated that the energy basket provides overall positive returns – both in absolute terms and on a risk‐adjusted basis. However, with five models to choose from, the question becomes, which one to select? A valid approach is to each year select the model which performed best the previous year. However, we have shown that relying on the in‐sample evidence for model selection might result in suboptimal out‐of‐sample performance: the ANN model had an in‐sample IR comparable to the best model – had we chosen this model, this would have resulted in the worst out‐of‐sample performance. An alternative approach is to implement an ensemble (Breiman 1994; Mendes‐Moreira et al. 2012) strategy whereby we combine the predicted returns across all five models via equal weight – thereby taking an agnostic view on which model is best.¹¹

In Table 9.5 we repeat the performance statistics for our energy basket with an additional column added for the ensemble strategy.

Table 9.5 Out‐of‐sample performance statistics

Source: RavenPack, January 2018.

Statistics	ELNET	KNN	ANN	RF	GBN	Ensemble
Ann. ret.	8.5%	11.7%	4.0%	13.1%	12.3%	9.8%
Ann. vol.	15.8%	14.1%	14.7%	15.4%	16.2%	15.0%
IR	0.54	0.83	0.27	0.85	0.76	0.65
Hit ratio	51.4%	51.7%	50.6%	52.6%	51.3%	51.1%
Max DD	36.2%	19.1%	45.1%	30.7%	23.6%	38.3%

By combining the five models, we generate an IR of 0.65 with an annualized return of 9.8%. Without any prior knowledge about which of the five models perform best, we are able to construct an ensemble which is competitive in terms of returns – both in absolute and risk‐adjusted terms – and which allows for considerable risk reduction associated with model selection, therefore reducing the risk of potential overfitting. This highlights why ensemble methods can be strong performers despite relying on a mixed basket of models; in this particular case, we achieve an IR of 0.65 despite giving 40% weight to models with relatively poor performance (ELNET and ANN).

Since 2015, the ensemble basket has yielded a total cumulative return of 29.3%. In comparison, the equivalent long‐only daily‐rebalanced benchmark portfolio has yielded total returns of −9.0% with an IR of −0.11.

Analyzing the time series of returns, we note the high correlation between our best models (RF and KNN) and the ensemble, underlining once again the competitiveness of the model‐agnostic ensemble approach. Meanwhile, the correlation between the ensemble and the long‐only basket is negative.¹²

In Figure 9.2 we have plotted the cumulative returns profiles for the ensemble strategy against a long‐only basket. Overall, the ensemble strategy has performed reasonably well out‐of‐sample, though it is clear that it was more performant in the first half of the period when energy commodities, in general, plummeted across the globe. This indicates that there may have been a regime shift since the middle of 2016 – and commodities have indeed been trading sideways without much volatility since then – which the ensemble model struggles to fully capture. In Section 9.4.5, we investigate this in more detail.

Image described by caption and surrounding text. — **Figure 9.2** Cumulative log‐returns. The red vertical line marks the beginning of the out‐of‐sample period.

*Source:* RavenPack, January 2018.

9.4.4 Ensemble portfolio – marginal contributions

Having determined that our ensemble approach offers a simple yet well‐performing method for combining our models, in this section we evaluate the performance contribution of each commodity to the overall portfolio. In Figure 9.3 we show the dependency of the portfolio on any single commodity, presenting the IR when systematically leaving out one commodity at a time to capture the marginal contribution of each commodity. The label on the x‐axis makes reference to the commodity which is dropped from the basket.

Bar graphs depicting Annualized return (%) and Information ratio for (in the order of highest to lowest): Crude oil; gasoline; heating oil; natural gas. — **Figure 9.3** Out‐of‐sample information ratios. The names on the x‐axes specify which commodity has been left out of the portfolio.

*Source:* RavenPack, January 2018.

By systematically removing one asset at a time from the basket we achieve IRs of 0.16–0.69 and annualized returns of 2.9–10.9%. Remarkably, the removal of natural gas hurts the portfolio the most, with the IR declining to 0.16, this commodity being the one contributing the most towards the performance of the portfolio.

These results also suggest that there may be additional performance to be had by modelling each commodity individually, but such a step drastically reduces the number of non‐zero entries in the explanatory variables, resulting in fewer features and potentially more biased estimates as well. For that reason we have chosen to model all four commodities together, implying that we assume that a news category has the same impact across the basket.

9.4.5 Regime detection in the ensemble portfolio

Up until now, we have traded all signals generated by our models. While such an approach is attractive from a signal diversification point of view, it may still be interesting to evaluate whether our signals perform particularly well during certain time periods. For example, volatile market regimes may provide more trading opportunities through stronger signals.¹³ Conversely, quiet markets may allow fundamental news to have a more pronounced impact on day‐to‐day returns without them being clouded by noisy fluctuations. By trading across all regimes, as is the case with the ensemble approach in Section 9.4.3, we may fail to take advantage of any regime dependency. In other words, by restricting ourselves to a subset of trades, we may be able to improve the per‐trade return, while reducing overall trading and the cost associated with it.

We seek to determine whether there are certain regimes in which the portfolio performance is particularly strong. Concretely, we illustrate this approach by creating various portfolios which only trade the underlying commodities when certain requirements related to the realized volatility are met.

We implement this by testing a volatility filter based on (i) the 1‐day lagged 21‐day volatility being above its annual average, or (ii) the 1‐day lagged 10‐day volatility being above its 21‐day equivalent. In other words, is volatility high relative to the last 12 months or rising?¹⁴ When at least one of the conditions is fulfilled we are in a high volatility environment, otherwise we are in a low volatility environment.

In Table 9.6 we present results of conditioning the ensemble portfolio from Table 9.5 on the two volatility regimes (high‐vol/low‐vol). As can be observed, by conditioning on the volatility regime, we are able to find discrepancies in performance. In particular, we notice that over the in‐sample period, the high‐vol signal outperformed the low‐vol one. Although both regimes provide positive returns, this observation, within the training period, raises the question whether we can achieve better out‐of‐sample performance by avoiding low‐vol regimes when trading the signal.

Table 9.6 Performance statistics

Source: RavenPack, January 2018.

Statistics	In‐sample		Out‐of‐sample
Statistics	High‐vol	Low‐vol	High‐vol	Low‐vol
Ann. return	20.12%	14.05%	21.3%	−3.0%
Ann. volatility	16.3%	15.5%	16.9%	15.3%
IR	1.23	0.91	1.27	−0.20
Hit ratio	55.3%	54.6%	53.9%	47.5%
Max drawdown	35.2%	25.1%	18.0%	62.2%
Per‐trade return (bps)	8.59	8.74	8.82	−1.97
Number of trades	6262	3043	1929	811

The high‐vol and low‐vol strategies trade only during periods of high/low volatility. The ensemble strategy trades irrespective of the volatility regime.

Looking at the out‐of‐sample performance confirms this intuition, as we find that periods of high volatility yield considerably higher returns, both in absolute and in risk‐adjusted terms. In particular, the out‐of‐sample period yields an IR of 1.27 with annualized return of 21.3% for the high‐vol regime compared with an IR of −0.20 and annualized return of −3.0% for the low‐vol regime. The discrepancy is further supported by Figure 9.4, where we show that the high‐vol strategy consistently yields superior returns over the full‐sample period, and particularly so during the second half of the out‐of‐sample period, where the low‐vol strategy yields negative performance.

In Figure 9.5 we compare the profiles of the out‐of‐sample cumulative returns from the ensemble and high‐vol strategies. In particular, we show that the high‐vol strategy not only yields higher returns but also has the advantage of being more robust, experiencing a positive trend across most of the out‐of‐sample period. Specifically, the high‐vol strategy provides more consistent performance, avoiding most of the negative trend showed by the ensemble during 2017, mostly due to the low‐vol signals. Moreover, even though the high‐vol strategy is characterized by a lower number of trades compared with the ensemble (1929 vs 2740), this is more than compensated by the boost observed in per‐trade returns. By only trading the high‐vol signals, we are able to more than double the per‐trade return compared with the ensemble, from 3.38 to 8.82 basis points.

The ensemble method is a good starting point for developing a trading portfolio, but the results in Table 9.5 and Figures 9.4 and 9.5 underline that it fails to fully take into account potential regime shifts seen in the market. This is not surprising, since our ensemble model only includes event relevance‐scaled dummy variables based on whether a RavenPack event category was triggered or not – and only for one day. More elaborate models could include information such as lagged category triggers, news volume, sentiment or novelty filters, or even market data such as asset volatility and returns (Hafez and Lautizi 2017).

Moreover, looking at the dynamics of cumulative returns over the full sample, we notice that they seem to be affected by seasonality. Preliminary research based on splitting the signals into spring–summer vs autumn–winter¹⁵ showed that the latter produced considerably higher returns, confirming our intuition about seasonality. Such evidence is consistent with the hypothesis that our predictive models work better during spikes in the expected and actual commodity demand occurring in colder months. Trying to exploit seasonality to further improve performance is yet another interesting direction to investigate in future research.

9.5 CONCLUSION

In this study we set out to evaluate the performance of commodity futures using commodity‐related news stories captured in RPAs and a suite of five well‐known machine learning models. We create an energy commodity basket, which contains four commodity futures, including crude oil and natural gas.

We illustrate how the five machine learning models stack up against each other. An RF model is the top performer with an IR of 0.85, followed the KNN and the GBN. With our linear model (ELNET) among the worst‐performing models, our results suggest that the added complexity other models bring to the table – in terms of being able to detect nonlinearities including interaction terms – considerably increases the predictive power for next‐day returns.

Nevertheless, by including all these machine learning techniques, we proceed to demonstrate how the implementation of a simple ensemble strategy, whereby we equally weight all model predictions for the basket, produces robust results. Taking an ensemble approach has the benefit of lowering the bias and variance through aggregation of the individual models – and is furthermore agnostic with regards to what is the optimal individual model. The ensemble approach yields an IR of 0.65 with an annualized return of 9.8% and a per‐trade return of 3.88 basis points. We also illustrate how the ensemble basket changes as a function of the mix of commodities entering the portfolio. Specifically, we show how the IR changes when we systematically drop one commodity at a time from each basket.

Finally, we demonstrate how volatility regimes impact the performance of our basket. In particular, by imposing trading restrictions on our ensemble basket, we show how both risk‐adjusted and per‐trade returns can be improved while lowering the number of trades and hence cost by incorporating information not utilized in the ensemble strategy. In particular, we highlight how a strategy which trades only during high volatility regimes results in an IR of 1.27 while more than doubling the per‐trade returns.

Exploratory analysis shows that our predictive signals are potentially more profitable during autumn–winter months, which suggests that further research in how to model and take advantage of such seasonality in the predictive models represents an interesting direction to further enhance these strategies.

The event taxonomy in RPAs has capabilities for commodities trading well beyond those investigated in the present study. While we have taken advantage of RPA's capability to detect commodity‐related news stories at the event category level, we have disregarded other types of potentially impactful news, for example about the global economy, as well as metrics such as the Event Sentiment Score, which we used extensively earlier in our research on equities. Lastly, the framework presented herein can be easily modified or extended to include other asset classes which are mainly influenced by the macro economy, such as equity index futures, bond futures and currencies.¹⁶

REFERENCES

Altman, N.S. (1992). An introduction to Kernel and nearest‐neighbor nonparametric regression. The American Statistician 46 (3): 175–185.
Brandt, M.W. and Gao, L. (2016). Macro Fundamentals or Geopolitical Events? A Textual Analysis of News Events for Crude Oil. Duke University/University of Luxembourg.
Breiman, L. (1994). Stacked regressions. Machine Learning 24 (1): 49–64.
Breiman, L. (2001). Random forests. Machine Learning 45 (1): 5–32.
Donoho, D.L. (2000). High‐Dimensional Data Analysis: The Curses and Blessings of Dimensionality. Stanford University, Department of Statistics.
Friedman, J., Hastie, T., and Tibshirani, R. (2000). Additive logistic regression: a statistical view of boosting. The Annals of Statistics 28 (2): 337–407.
Hafez, P. and Guerrero‐Colón, J.A. (2016). Earnings Sentiment Consistently Outperforms Consensus. RavenPack Quantitative Research.
Hafez, P. and Koefoed, M. (2017a). Introducing RavenPack Analytics for Equities. RavenPack Quantitative Research.
Hafez, P. and Koefoed, M. (2017b). A Multi‐Topic Approach to Building Quant Models. RavenPack Quantitative Research.
Hafez, P. and Lautizi, F. (2016). Achieve High Capacity Strategies Trading Economically‐Linked Companies. RavenPack Quantitative Research.
Hafez, P. and Lautizi, F. (2017). Abnormal Media Attention Impacts Stock Returns. RavenPack Quantitative Research.
Hastie, T., Tibshirani, R., and Friedman, J. (2009). The Elements of Statistical Learning, 2e. Springer Series in Statistics, chapter 11.
Mendes‐Moreira, J., Soares, C., Jorge, A.M., and de Sousa, J.F. (2012). Ensemble approaches for regression: a survey. ACM Computing Surveys 45 (1): Article 10.
Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, 67(2), 301–320.

NOTES

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for CHAPTER 9: Machine Learning and Event Detection for Trading Energy Futures

Create new playlist

Sign In

Sign Up

9.1 INTRODUCTION

9.2 DATA DESCRIPTION

9.2.1 Price data

9.3 MODEL FRAMEWORK

9.3.1 Feature creation

9.4 PERFORMANCE

9.4.1 Model portfolios

9.4.2 Variable importance

9.4.3 Ensemble portfolio

9.4.4 Ensemble portfolio – marginal contributions

9.4.5 Regime detection in the ensemble portfolio

9.5 CONCLUSION

REFERENCES

NOTES

Table of Contents for
CHAPTER 9: Machine Learning and Event Detection for Trading Energy Futures