Feature Engineering For Algorithmic And Machine Learning Trading
In this article we focus on feature engineering based on an ensemble of securities. In another recent article we presented a method for developing long/short equity strategies based on engineered features for each security in a group.
Feature engineering is the hardest aspect of machine learning and algorithmic trading. If the features (predictors or factors) used do not have economic value, performance is unlikely to be satisfactory. Algorithmic trading and machine learning cannot find gold where there is none. The use of widely known features is unlikely to produce anything of value. Developing an algo and applying machine learning is the easy part of this process despite some common misconceptions. A few operators of platforms where aspiring traders gather to test their programming skills offer known features that are “tortured until they confess to anything”. These approaches will probably fail because of data-mining bias. Note that this bias is cumulative and at some point grows out of control.
The P-ratio is the fraction of stocks in Dow 30 that after the close of a given day have positive directional bias. The directional bias is in turn another engineered feature for each security in Dow 30. The calculations are quite involved but the directional bias is a probability ranging from 0% to 100% that arises from a weighted average of probabilities of certain signals to occur, which also engineered features based on other, more primitive features.
The P-ratio along with four related features is calculated by DLPAL PRO and DLPAL LS software. In the case of Dow 30, we call it the P-Dow ratio. Its value ranges from 0 to 1. Extreme values or 0 or 1 rarely occur. If the ratio rises above 0.70, that corresponds to long signal and if it drops below 0.5 that generates a short signal. The asymmetry is due to the fact that equities have a positive structural bias.
Below is the DLPAL LS/DLPAL PRO workspace for generating the ensemble P-Dow ratio features. The history length is set to 1848 bars because we want to generate history starting on 01/04/2010 to 05/05/2017.
It took less than half a day for the program to generate the historical data file with the features running on a 64-bit Windows 10 laptop with Intel Core i5 CPU at 2.50 GHz. Below is how a historical file with the futures looks like.
There are five features but we are only interested in Pratio in this articles. These features are explained in the manual. In this example we are not going to generate train and score files because we will be developing a simple strategy. These other files are needed in the case of machine learning applications.
Next, we combine the historical files with the features with the historical data of SPY ETF in the same date range. We do that in excel. This is how the combined csv file looks like after removing the head.
We can now import the data in Amibroker and backtest our strategy. The results are shown below.
Volume is AvgPL, Open Interest is AvgPS, AUX1 is Pratio and Aux2 is AvgSPL. We are interested only in Pratio. Below is the trading strategy:
Buy if AUX1 > 0.70
Short if AUX1 < 0.50
This is a long/short strategy. For the backtest, initial capital is $100K, equity is fully invested and commission is $0.01 per share. All orders are placed at the open of the next bar to prevent look-ahead bias. In the table below you can find the equity curve, underwater curve, monthly returns table and Monte Carlo simulation results.
Both equity and underwater equity curves show potential. Notice that return in 2015 is 18.1% versus -0.9% for buy and hold. From the Monte Carlo simulation it may be seen that the probability of a drawdown greater than 35% is below 5%.
The table below summarizes the performance of the strategy.
Results from this strategy were above expectation. MAR (CAGR/Max. DD) is higher for the strategy at 0.73 versus 0.70 for buy and hold. Therefore we verify that these features have some economic value.
This was just an example of what one can do with the features engineered by DLPAL. An example with features generated for each security in Dow 30 can be found here.
This article was originally published in Price Action Lab Blog.
If you have any questions or comments, happy to connect on Twitter: @priceactionlab
Disclaimer: No part of the analysis in this blog constitutes a trade recommendation. The past performance of any trading system or methodology is not necessarily indicative of future results. Read the full disclaimer here.
About the author: Michael Harris is a trader and best selling author. He is also the developer of the first commercial software for identifying parameter-less patterns in price action 17 years ago. In the last seven years he has worked on the development of DLPAL, a software program that can be used to identify short-term anomalies in market data for use with fixed and machine learning models. Click here for more.