Millea 2021
“Deep Reinforcement Learning for Trading—A Critical Survey” by Adrian Millea (Imperial College London) is an open-access review published in Data (MDPI) 2021, 6(11), article 119 (DOI 10.3390/data6110119). It surveys the deep-reinforcement-learning-for-trading literature and — as the word “critical” in the title signals — frames itself not as a celebration of the field but as a diagnosis of its problems. The survey’s two stated results are (1) to identify the most promising research directions and (2) “to show how a lack of consistency in the community can significantly impede research and the development of DRL agents for trading.” It is the vault’s primary evidence that the deep-RL-trading literature, taken as a whole, is an active research programme with a reproducibility deficit rather than a body of substantiated profitable systems.
Millea 2021 [defines] Reinforcement Learning Trading Policy Adrian Millea [proposes_model] Millea 2021
The corpus was assembled by a transparent, reproducible procedure. Millea started from the 16,652 articles that cited the foundational Deep Q-Network paper as of 18 October 2021, narrowed by the search words “trading market reward action state” to 401, removed electricity-market and robotics papers to reach 182, and manually sifted 98 downloadable papers; an independent search yielded 38 more, for a total of 152 articles. The downloaded PDFs were then indexed with the Recoll desktop search tool and counted for representative keywords to classify action-space type (discrete vs continuous), state/time-series encoding, market type, input data frequency, and model-free vs model-based RL. Millea is candid that even this meta-analysis is noisy — “the following results should be taken with a grain of salt” because a keyword may appear as a reference rather than an actual use — which is itself a statement about how hard the literature is to aggregate even at the descriptive level.
The central finding is methodological fragmentation. Papers use incompatible action spaces, state representations, markets, data frequencies and reward functions, so the corpus cannot be compared like-for-like and the profitability numbers reported across papers cannot be pooled into an evidence base. There is no shared benchmark dataset and no common evaluation protocol — the role that ImageNet or the Atari suite plays in computer vision and game-playing RL simply does not exist for trading. The consequence is structural: when every paper trains on its own private data, with its own cost model, its own reward shaping and its own train/test split, a reader cannot tell whether a reported Sharpe ratio reflects a genuine method advance, a favourable dataset, or an unreported search over configurations. This is the link to Data-Snooping Bias and Overfitting in Quantitative Trading: a fragmented literature with no common test set is, in aggregate, an uncontrolled multiple-comparisons exercise, and the positive backtests that get published are a non-random, selection-biased sample of all backtests run.
Millea 2021 [supports] Data-Snooping Bias Millea 2021 [supports] Overfitting in Quantitative Trading Millea 2021 [contradicts] Reinforcement Learning Trading Policy
The second finding is a reproducibility deficit, and it is the reason the survey connects to the Replication Crisis in Quantitative Finance. Millea reports that very few DRL-trading works release code; where code exists it is generally proprietary, and even open implementations are hard to reproduce because the key details — hyperparameters and the neural-network architecture — are not disclosed. A field whose results cannot be independently re-run cannot accumulate verified knowledge: a positive backtest that no one else can reproduce is a claim, not a finding. Because RL agents have thousands of tunable hyperparameter combinations and are flagrant overfitters, the absence of code and disclosed settings means the published positive results cannot be checked for the lucky-seed and tuned-on-the-test-set failure modes that the rest of this vault documents. Independent replication is the missing top tier of the vault’s profitability grading rubric, and Millea shows it is structurally absent across 152 papers.
Millea 2021 [supports] Replication Crisis in Quantitative Finance Replication Crisis in Quantitative Finance [opposes] Out-of-Sample Backtesting
It is important to be precise about what the survey does and does not claim. Millea is genuinely optimistic about research directions — hierarchical DRL, model-based “world-model” RL that learns a predictive model of the market, and risk-measure reward shaping — and the author’s own later work pursues exactly these. But the survey nowhere concludes that DRL trading is reliably profitable after costs and out-of-sample. It is a methodological map of an immature field, not an evidence-of-profit claim. That is precisely why its profitability_evidence_grade is inconclusive: a survey reviews, it does not test, and the substance of this survey is that the underlying literature cannot currently support a profitability verdict at all. The honest reading for the vault is that the many positive deep-RL-trading backtests in circulation are individually un-replicated and collectively un-aggregable, so the field-level evidence for tradeable alpha remains absent.
Millea 2021 [relates] Sim-to-Real Gap Millea 2021 [supports] Sun Wang An 2021
Connections
- Adrian Millea — proposes_model, 2021, source: https://www.mdpi.com/2306-5729/6/11/119
- Reinforcement Learning Trading Policy — replication_missing, 2021, source: https://www.mdpi.com/2306-5729/6/11/119
- Sun Wang An 2021 — compares_benchmark, 2021, source: https://www.mdpi.com/2306-5729/6/11/119 (companion survey reaching a convergent verdict on open challenges)
- Overfitting in Quantitative Trading — suffers_overfitting_risk, source: https://www.mdpi.com/2306-5729/6/11/119
- Data-Snooping Bias — suffers_overfitting_risk, source: https://www.mdpi.com/2306-5729/6/11/119
- Replication Crisis in Quantitative Finance — replication_missing, source: https://www.mdpi.com/2306-5729/6/11/119
- Sim-to-Real Gap — lacks_live_evidence, source: https://www.mdpi.com/2306-5729/6/11/119
- Out-of-Sample Backtesting — relates, source: https://www.mdpi.com/2306-5729/6/11/119