Backtesting Introduction

There are entire books and excellent blog posts dedicated to backtesting. I won't regurgitate the content from these resources, however I would like to add some context to this project. I will review some of the basics of backtesting touch on points that will become important as I build the options backtesting system.

Backtesting Introduction

In the most general sense, backtesting refers to testing the results of a predictive model using historic data. Algorithmic traders use backtesting to validate that their strategy can make money. Naively, traders may consider a backtest a good indicator of future performance. While this may be the case, there are many things that need to be considered when building a backtest. If you are skilled (or lucky) enough to find edge systematically with your trading model, that edge can erode quickly due to market regime changes or other traders chasing the same methods. That means, it is important to continuously refine the backtest through time. There are many reasons to backtest a trading strategy. Backtesting provides a way to filter trading strategies under consideration, make adjustments and refinements to the trading model (and even find model errors), and get an idea of optimal parameter values (discussed below).

Backtesting can be as simple or as complex as the trader is willing to make it. On one hand, you can use a simple tool like ETF Replay. In the free version, you can select a portfolio of ETFs, their weights, and rebalance schedule and see how the portfolio would have performed through time. In this backtest, the ETF weights and rebalance period are the parameters to the simple predictive model. On the other hand, you can build robust, completely customized strategies with tools like Quantopian. Finally, you can use off the shelf tools like Trade Station which do most the heavy lifting for you.

As I discussed in my previous post, most backtesting software focusses on equities and futures which is why I'm building my own. Like any scientific endeavor, there are many considerations.

Data Quality

Data quality is one of the most important part of any statistical test. Backtesting a trading strategy is no different. Whether you're running an ultra low latency statistical arbitrage strategy or you're Warren Buffet, you need good quality data against which to run your models and make decision. I'll be discussing data and data quality in more detail in future posts.

Training and Testing (i.e. in Sample and Out of Sample)

As I'll discuss below, most trading models have parameters to be optimized. It is important to break the time series data into separate groups. The first group is called the training set (or in sample data) and the second group is called the test set (or out of sample data). Quite simply, most systems will use the training set to optimize the model parameters. The second set is then used to determine how well the model performs with those optimized parameters.

Backtesting introduction. Training set and test set.

Training set and test set.

This process is more an art than a science but there are some rules of thumb. Generally, 70-75% of the data is held back for the training set while the remainder is used for the test set. This becomes a challenge even in the "simple" case of using persistent securities like equities. For example, if you're backtesting a weighting scheme with 10 securities and one goes bankrupt (effectively disappears), your data needs to account for a portion of the portfolio going to zero. Even with futures, backtesting often uses continuous adjusted prices to account for contango or backwardation in rolling the contracts during maturity. In the case of options, securities disappear regularly during expiration. Consider a calendar spread (position composed of two options with the same strikes but differing expiration dates), you can't very well split your data series into train and test data between the two expirations.

Despite the challenges, training and testing data at crucial. One technique discussed in "Building Winning Algorithmic Trading Systems: A Trader's Journey From Data Mining to Monte Carlo Simulation to Live" by Kevin Davey is walk forward optimization. There are few ways to do it, but in essence you split your data into train and test sets, then slide your training data forward through each iteration recording the optimal parameter values as you go. Approaches differ in what final parameter values to use, but I would suggest first building a distribution of the values and taking the mean. Building a distribution will give you an idea of how stable the parameters are through time.

Backtesting introduction. Walk forward testing.

Walk forward testing.

Again, there is an excellent discussion on walk forward testing in Kevin Davey's book. I will strive to use this method in my system.

Parameter Optimization and Curve Fitting

Most training systems have parameters that are optimized optimized. As I alluded to above, this means the backtest is run over and over again until the parameters which result in some optimal metric (such as profit) are found. Usually, the optimization problem seeks the highest Sharpe Ratio which is a common performance metric. In more advanced tests you may seek to minimize drawdown or a volatility estimator. Naive optimization often leads to curve fitting and performs poorly in out of sample tests which I discuss below.

Parameters may become unstable meaning the optimized variable changes a lot for slight changes in the parameters. This is discussed in the great blog QuantStrat TradeR here. He talks about a "stable region" described in “Trading Systems: A New Approach to System Development and Portfolio Optimisation”. This is an excellent approach to understand how well your model is behaving with the selected parameters. Further, it provides insight into the model itself.

Curve fitting is common in backtesing and provides an unrealistic sense of the quality of a model. Curve fitting in the world of backtesting is often a symptom of optimizing parameters without an out of sample period. For example, if your historic data runs from January 2000 through December 2009 and you use the entire period to optimize the parameters of your trading model, there is a very good chance your backtest will perform very well. If you actually tried to trade it however, it would likely perform terribly. This is because your parameters were optimized for the data provided with no chance to test it in the "wild".

Backtesting introduction. Regression pic assymetrique

Overfitting in action.

As you can see in the image above, each time the red line is updated, it fits the data better and better. This is a visual example of overfitting in practice. If you tried to fit the resulting model to a data set that did not look like the original data set, the error would be huge.

In a recent paper, Backtest Overfitting Demonstration Tool: An Online Interface, the authors provide an online tool tool to demonstrate what can happen with an overfitted model. They define overfitting as:

In mathematical Finance, backtest overfitting means the usage of historical market data (a backtest) to develop an investment strategy, where too many variations of the strategy are tried, relative to the amount of data available. Backtest overfitting is now thought to be a primary reason why quantitative investment models and strategies that look good on paper often disappoint in practice.

This is taken directly from their site:

Graphic on left shows an investment strategy (in blue) making steady profit while the underlying trading instrument (in green) gyrates in price. This website is designed to illustrate that this type of "optimal" investment strategy is all too easy to produce if enough variations are tried, yet it is typically ineffective moving forward, a consequence of backtest overfitting. Graphic on right shows the same investment strategy performing poorly on a different sample of the same trading instrument, which demonstrates that an overfit investment strategy is quite likely to fail in real world investing.

I strongly suggest taking the time to read the paper and investigate their examples.

Over Parameterized Model

A trading model with a lot of parameters can be problematic. The more parameters to optimize, the more risk of overfitting exists. Consider an equity backtesting model using a moving average crossover as a trading signal. When the "fast" moving average crosses above the "slow" moving average, the system returns a buy signal. When the opposite occurs, the system returns a sell signal. Of course "fast" and "slow" are arbitrary, so you might consider running an optimization routine to find which parameters maximize the Sharpe Ratio of the strategy. In this case, there are only two parameters to optimize so your chances of overfitting your model to the data are relatively small. Now consider a complex strategy that uses moving averages, oscillators, RSI, and standard deviation to determine a trading signal. You run a similar backtest only this time there are 10 parameters to optimize. Your backtest may find a few opportunities within the training data but when applied to the test set, it will find few if any.

I don't plan to use technical indicators on underlying price data to determine option strategies. Rather volatility and the options sensitivities to enter and exit trades. For example, DTR Trading uses days to expiration as parameters to enter and exit iron condors. While the concept may be difficult to grasp at first, these are still parameters that need optimization. Especially in premium generating strategies such as selling iron condors where time plays a critical role. Therefore, it is important to find the optimal time parameters for the strategy.


Most off the shelf backtesting systems prevent some of the most obvious biases. But when building a backtesting system from scratch, it is critical to have a clear grasp on when these are and how to prevent them.

Survivorship bias Survivorship bias occurs when the system ignores a security that has become worthless within the backtesting time frame. In poor quality datasets, the dead company will be removed from the entire data feed or the backtest logic will ignore it. This obviously overestimates the performance of your system. in the context of options trading, options by definition have a finite life span. Survivorship bias will be an interesting challenge in this system and one to pay close attention to.

Look ahead bias As the name suggests, look ahead bias refers to the system using data that it should not know about at the time it's used. For a simple example, consider a system that trades on the daily low price. The day's low price is not known until the conclusion of the trading day and it is not reasonable to expect the system to pick the day's low price within the trading day. In this case, the backtest is using information that should not be known at the time to place trades. This type of bias will be prevalent in backtesting options strategies.

In this backtesting introduction, I laid out some of the fundamental considerations of backtesting trading strategies. In the next post, I will detail the finer points that I will consider closely in building this system. Specifically, how to handle the challenges of backtesting options strategies.

Leave a Reply