ML in Finance, specifically Backtesting

7 min readFeb 22, 2021

How can ML be useful in finance?

Many financial operations require making decisions based on pre-defined rules, like option pricing, algorithmic execution, or risk monitoring. This is where the bulk of automation has taken place so far, transforming the financial markets into ultra-fast, hyper-connected networks for exchanging information.[Computational speed, is obviously one of the advantages computers have over man.]

Although automation is scaling at massive levels, the next wave of automation does not invovle following the rules, but making judgement calls. As emotional beings, subject to fears, hopes, and agendas, humans are not particularly good at making fact-based decisions, particularly when those decisions invlove conflicts of interest. In those situations, investors are better served when a machine makes the call, based on facts learned from hard data. This not only applies to investment strategy development, but to virtually every area of financial advice: granting a loan, rating a bond, classifying a company, predicting earnings, etc. Furthermore, machines will comply with the law, always, when programmed to do so.

At Carnegie Mellon University in Pittsburgh, my subject of study was Behavioral Economics and cognitive biases which exists, to a great degree, in behavioral finance. These biases include:

· Overconfidence Bias (a false sense of one’s skill)

· Self-Serving (the attribution of wins to skill and loses to luck)

· Herd Mentality Bias (blindly following successful investors)

· Loss Aversion Bias (fear loses instead of embracing and managing risk)

· Framing Cognitive Bias (making decision on information presented instead of facts)

· Narrative Fallacy Bias (choosing less desirable outcomes because better story is presented)

· Anchoring Bias (the over-reliance on pre-existing data for making a decision)

· Confirmation Bias (reinforcing one’s idea with, false, pre-existing ideas by others. Confirmation Bias is the most dangerous bias in investing)

· Hindsight Bias (predicting current outcomes based on the wrong historical believes).

The history of capital markets is ripe with sad examples when good traders made wrong decisions because of biases presented above.

In those situations, investors are better served when a machine makes the call, based on facts learned from hard data. This not only applies to investment strategy development, but to virtually every area of financial advice: granting a loan, rating a bond, classifying a company, predicting earnings, etc. Furthermore, machines will comply with the law, always, when programmed to do so.

Whose better the human or the machine?

Do you remember when people were certain that computers would never beat humans at chess? Jeopardy? Oh what about poker, Go, Atari games? Millions of years of evolution have fine-tuned our ape brains to survive in a hostile 3-dimensional world where the laws of nature are static. An ML algorithm can spot patterns in a 100-dimensional world as easily as in our familiar 3-dimensional one. So the best ways to get the best outputs are to combine human guesses (inspired by fundamental variables) with mathematical forecasts.

Introduction to backtest overfitting?

Obviously, computers are not subject to man’s cognitive biases but there are other biases which are applicable to back-testing and computer simulation which computers are subject too.

In its narrowest definition, a backtest is a historiacal simulation of how a strategy would have performed shoudl it ahve been run over a past period of time. As such it is a hypothetical, and by no means an experiment. At a physics lab, we can repeat an experiemnt while controlling for environental variables, in order to deduce a precise cause-effect relationship. In contrast, a backtest is not an experiment, and it does not prove anything. A backtest garuantees nothing.

Backtest overfitting is arguably the most important open problem in all of mathematical finance (Overfitting is a modeling error that occurs when a function is too closely fit to a limited set of data points). It is the equivalent to “P versus NP” in computer science (The P versus NP problem is a major unsolved problem in computer science. It asks whether every problem whose solution can be quickly verified can also be solved quickly). If there a precise method to prevent backtest overfitting, anyone would be able to take backtests to the bank. A backtest would be almost as good as cash, rather than a sales pitch. Investors would risk less, and would be willing to pay higher fees. Hedge funds would allocate funds to portfolios with confidence, knowing that back-tested results, in no shape or form, were influenced by data sample inputs. Harshly put, if your model cannot perform in the real world, then your model is good for nothing.

What is the point of backtest then?

It is a sanity check on a number of variables including bet sizing, turnover, resilience to cost, and behavior under a given scenario. A good backtest can be extremely helpful, but backtesting is well is extremely difficult. In 2014 a team of quants at Deutsche Bank, led by Yin Luo, published a study under the title “Seven Sins of Quantitative Investing” (Luo et al. [2014]). In this piece the team mentions:

Survivorship bias: Using as the current investment universe, not realizing that some companies went bankrupt and securities were delisted. A good example of the survivorship bias is using today’s data to analyze finance sector companies, without taking into account 2008 stock market melt-down, where many financial firms like Lehman Brothers, Bear Sterns, etc., went to zero.
Look-ahead bias: Using information that was not public at the moment the simulated decision would of been made. Be certain about the timestamps for each data point. Take into account release dates, release delays, etc. For example, when analyzing a certain stock, one may use sales data, ticker volume, P/E, but not take into account a pending lawsuit which may materially affect the stock price.
Storytelling: Making up stories after posting to justify a random pattern. This bias is similar the Hindsight Bias where good back-testing results are attributed to some pattern, where, in reality, no pattern exists.
Data mining and data snooping: Training the model on the testing set. Oftentimes, the only accessible data, is “testing” data which does not correlated well to the actual, real-time events.
Transaction costs: Simulating transaction costs is hard because the only way to be certain about the cost would have been to do the actual trade. Transactions costs are difficult to simulate because the frequency and volume in the real-time environment may differ from the testing environment. Also, transactions cost for derivatives, such as options and fixed income derivate contracts are difficult to simulate because these financial instruments are subject the underlying instrument movement, market volatility, and counterparty risk.
Outliers: Basing a strategy on a few extreme outcomes that may never happen again as observed in the past. This is one of the biggest problems when normalizing any time series. Left and right tail outliers skew an entire time-series data to a greater degree. For example, let’s say that some hedge fund had a 1% for the past 11 months and 10% performance on month 12. The 10% performance will make a positive skew for the entire time-series, although such performance may never occur again. Ignoring or taking into account such performance is not entirely correct. In such a case, two separate back-tests needs to be performed and the results of these back-tests need to be combined to get a better predictive result.
Shorting: Taking a short position requires finding a lender. The cost of lending and the amount available is generally unknown, and depends on relations, inventory, relative demand, etc. Recent market events surrounding well-publicized short squeeze on GameStop (GME) stock by Robinhood traders show how difficult it is to simulate shorting.

These are just few of the basic errors that most papers published on backtesting are making routinely. Others include ignoring hidden risk, focusing only on returns while ignoring other metrics, ignoring existence of stop-out limits or margin calls, etc.

What are some ways prevent overfitting?

1.) Develop models for entire asset classes or investnement universes, rather than for specific securties. Investors diversify hence they do not make mistake X only on security Y.
2.) Apply bagging (bagging attempts to reduce the chance of overfitting complex models. It trains a large number of “strong” learners in parallel. A strong learner is a model that’s relatively unconstrained. Bagging then combines all the strong learners together in order to “smooth out” their predictions) as a means to both prevent overfitting and reduce the variance of the forecasting error. IF bagging deteriorates the performance of a strategy, it was likely overfit to a small number of observations or outliers
3.) Do not backtest until research is complete
4.) Record every backtest conducted on a dataset so that the probability of backtest overfitting may be estimated on the final selected result, and the Sharpe Ratio may be properly deflated by the number of trials carried out.

There could be full textbooks written about the art of backtesting. In my opinion, research, work ethics, data quality, and a robust analytical framework, are the keys to successful back-testing. I will leave you with this. “Backtesting while researching is like drinking and driving. Do not research under the influence of a backtest.” — Marcos Lopez de Prado.

ML in Finance, specifically Backtesting

Written by Michael Rozenvasser