Gort et al. 2022
“Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting” by Berend Jelmer Dirk Gort, Xiao-Yang Liu, Xinghang Sun, Jiechao Gao, Shuaiyu Chen and Christina Dan Wang (arXiv:2209.05559, v6 dated 31 January 2023; presented at the ACM ICAIF ‘22 workshop and the AAAI ‘23 Bridge on AI for Financial Services) is the key skeptical source in this vault on reinforcement-learning trading. Its premise is a direct rebuke of the optimistic RL-trading literature: existing work “applied deep reinforcement learning methods and optimistically reported increased profits in backtesting, which may suffer from the false positive issue due to overfitting.” The paper does not propose a profitable strategy — it proposes a test for rejecting unprofitable strategies that only look profitable, and then demonstrates that even its best surviving agent lost money.
The methodological contribution is to formulate backtest-overfitting detection as a hypothesis test in the Neyman-Pearson framework: H0 (the agent is not overfitted, probability of overfitting p below significance level alpha) versus H1 (overfitted, p at or above alpha). Crypto trading is framed as a Markov Decision Process — state holds cash, holdings, prices and technical indicators (size 81 for ten coins), reward is the change in portfolio value. The probability of overfitting p is estimated by combinatorial cross-validation: the training period is split into N=5 equal groups, k=2 are held out for validation, yielding J=10 train/validation combinations; validation returns are stacked into a matrix, split into in-sample and out-of-sample halves, and an agent is judged overfitted if its best in-sample rank maps to a below-median out-of-sample rank — the Probability of Backtest Overfitting logic of Bailey, Borwein, López de Prado and Zhu, here adapted to DRL agents. This is a more demanding test than the single-split walk-forward method that dominates DRL-trading practice, which validates on only one market situation and so “can easily result in model overfitting.” It is closely related to the vault’s Combinatorial Purged Cross-Validation concept.
Gort et al. 2022 [proposes_model] Markov Decision Process Trading Model Gort et al. 2022 [tests_strategy] Reinforcement Learning Trading Policy Gort et al. 2022 [defines] Probability of Backtest Overfitting
The empirical test is deliberately brutal. Three DRL algorithms — PPO, TD3 and SAC — are trained on five-minute data for ten high-volume coins (AAVE, AVAX, BTC, NEAR, LINK, ETH, LTC, MATIC, UNI, SOL) over February-April 2022, then evaluated out-of-sample over 05/01/2022-06/27/2022, a window containing two crypto market crashes. Transaction fees of 0.3% per trade are charged and a Cryptocurrency Volatility Index (CVIX) risk control liquidates holdings when volatility spikes. The overfitting test does its job: the walk-forward PPO agent scored p = 17.5% (rejected), the K-fold agent p = 7.9%, and among the three DRL agents p_PPO = 8.0% (accepted), p_TD3 = 9.6% (accepted, marginal) and p_SAC = 21.3% (rejected). The less-overfitted agents were also the better performers, exactly as the hypothesis predicts — confirming that overfitting control selects for relative robustness.
Gort et al. 2022 [supports] Combinatorial Purged Cross-Validation Gort et al. 2022 [contradicts] Reinforcement Learning Trading Policy Overfitting in Quantitative Trading [causes] Reinforcement Learning Trading Policy
The decisive, skeptical data point is in the cumulative-return table. With initial capital of $1,000,000, every method lost money over the test window: the accepted PPO agent -34.96%, TD3 -59.08%, SAC -59.48%, walk-forward PPO -49.39%, K-fold PPO -55.54%, the equal-weight portfolio -47.78%, and the S&P Cryptocurrency Broad Digital Market Index benchmark -50.78%. The “best” agent therefore won only in a loss-minimisation sense — it lost 35% instead of 50-60%. The paper’s own conclusion that this offers “confidence in possible deployment to a real market” is the optimistic gloss; the underlying numbers show a strategy that, deployed live with real capital over those eight weeks, would have destroyed a third of the portfolio. Read against this vault’s research goal, the finding is cautionary, essentially negative: it substantiates that overfitting detection is necessary and that walk-forward backtesting is unreliable, but it provides no evidence of tradeable absolute profitability — the opposite, in fact.
For grading, the paper is unusual in that it scores well on rigour precisely while delivering a negative result. It is genuinely out-of-sample, includes transaction costs (0.3%/trade), uses real benchmarks (market index and equal-weight), applies a state-of-the-art overfitting diagnostic, and — notably — releases replication code (the open-source FinRL_Crypto framework at github.com/berendgort/FinRL_Crypto). What it cannot show is profit: the best accepted agent lost ~35%, Slippage and market impact are not modelled on five-minute crypto data where they would be material, and the test window is a single two-month crash. The honest grade is therefore negative — not because the methodology is weak, but because the well-conducted experiment demonstrates underperformance and shows that the prior literature’s positive crypto-DRL backtests are likely overfitting artefacts. It is the necessary skeptical counterweight to Zhang Zohren Roberts 2019: where Zhang et al. show a costed positive backtest can exist, Gort et al. show how easily such backtests are false positives, and that surviving an overfitting test is not the same as making money.
Gort et al. 2022 [reports_underperformance] Reinforcement Learning Trading Policy Gort et al. 2022 [contradicts] Zhang Zohren Roberts 2019 Transaction Costs and Slippage [opposes] Reinforcement Learning Trading Policy
Connections
- Markov Decision Process Trading Model — proposes_model, 2023, source: https://arxiv.org/abs/2209.05559
- Reinforcement Learning Trading Policy — tests_strategy, 2023, source: https://arxiv.org/abs/2209.05559
- Cryptocurrency Market — trades_market, 10 coins, 5-min data, May-Jun 2022, source: https://arxiv.org/abs/2209.05559
- Reinforcement Learning Trading Policy — reports_underperformance, best accepted PPO agent lost 34.96% in test window, source: https://arxiv.org/abs/2209.05559
- Overfitting in Quantitative Trading — suffers_overfitting_risk, DRL backtest profits framed as false positives from overfitting, source: https://arxiv.org/abs/2209.05559
- Probability of Backtest Overfitting — detects_regime, p estimated via combinatorial CV; agents with p>=10% rejected, source: https://arxiv.org/abs/2209.05559
- Combinatorial Purged Cross-Validation — replication_available, combinatorial cross-validation used to estimate overfitting, source: https://arxiv.org/abs/2209.05559
- Data-Snooping Bias — suffers_overfitting_risk, hyperparameter search can produce a ‘lucky’ overfitted agent, source: https://arxiv.org/abs/2209.05559
- Transaction Costs and Slippage — includes_costs, 0.3% fee per trade; slippage/market impact not modelled, source: https://arxiv.org/abs/2209.05559
- Sim-to-Real Gap — lacks_live_evidence, deployment to a real market explicitly left as future-conditional, source: https://arxiv.org/abs/2209.05559
- Zhang Zohren Roberts 2019 — contradicts, skeptical counterweight to the strongest positive RL backtest, source: https://arxiv.org/abs/2209.05559
Sources
- Gort, B. J. D., Liu, X.-Y., Sun, X., Gao, J., Chen, S., & Wang, C. D. (2023). “Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting.” arXiv:2209.05559 v6 — https://arxiv.org/abs/2209.05559
- FinRL_Crypto replication framework — https://github.com/berendgort/FinRL_Crypto