
markov-model — Vault Index
Research Goal
Goal: Determine whether Markov-based trading models can be substantiated as profitable trading approaches, separating genuine tradeable edge from academic backtest artefacts, regime-classification usefulness, overfitting, and marketing claims.
Goal & Outcomes — Summary
Verdict — research complete (10 rounds, high confidence)
Markov-based models are substantiated as a regime-detection and risk-management component, not as standalone profitable trading systems. Regime classification reliably cuts volatility and drawdown; converting it into directional alpha that beats a benchmark after costs is not substantiated. No surveyed study reaches a
strongevidence grade and no credible public live track record exists — a deliberate steelman search did not overturn this.
| Research question | Can Markov-based trading models produce profitable, repeatable outcomes after realistic costs and robustness checks? |
| Answer | Useful component (regime detection / risk control); not a proven standalone alpha source. |
| Models in scope | Markov chains · Hidden Markov Models · Markov regime-switching · Markov Decision Processes · reinforcement-learning policies |
| Definitive answer | → Conclusion |
| Goal & Outcomes page | → Outcomes |
→ Conclusion — the definitive, fully-argued verdict. · → Outcomes — round-by-round findings and evidence by model family.
Seed entities (round 1)
- Major Equity Indices Daily Returns — Daily return series of major US, German and Japanese equity indices (S&P 500, DJIA, NASDAQ 100, DAX, Nikkei 225) used to backtest regime-…
- Options Market — The market for exchange-traded options — the asset class for the American-option MDP trading model of Nasir et al.
- US Equity Market — The US stock market — the venue for the portfolio-transaction execution problems studied via MDP and dynamic programming.
- Hidden Markov Model Regime Detection — HMM infers hidden bull/bear or low/high-vol market regimes from observed returns; mostly a risk filter, with weak standalone alpha after…
- Markov Chain Trading Model — Discrete-state Markov chain estimating a price-state transition matrix to predict the next up/flat/down state directly as a trading signal.
- Markov Decision Process Trading Model — Frames trading and execution as a sequential decision problem — state, action, reward, transition, policy — solved by dynamic programming.
- Markov Regime-Switching Model — Econometric model (Hamilton 1989 lineage) whose AR/volatility parameters switch with a latent Markov state; used for regime detection and…
- Reinforcement Learning Trading Policy — RL agents that learn a trading policy by trial-and-error reward maximisation; strong backtest literature but weak post-cost, out-of-sampl…
- Curse of Dimensionality — Exact dynamic-programming solution of an MDP becomes computationally infeasible as state and action spaces grow.
- Lookahead Bias from Smoothed Regime Estimates — Using full-sample smoothed regime labels in a backtest leaks future data into past decisions, inflating apparent profitability.
- Non-Stationarity — Market dynamics change over time, so a fixed MDP transition kernel and learned policy degrade out-of-sample.
- Parameter Instability and Estimation Noise — Regime-switching model parameters are estimated from few in-regime observations and a multi-modal likelihood, making them noisy and unsta…
- Partial Observability — Latent variables make the true MDP state unobservable, breaking the Markov property and turning the problem into a POMDP.
- Real-Time Regime Identification Lag — Real-time (filtered) regime inference lags ex-post (smoothed) inference by days to weeks; small misclassification erases any forecasting…
- Regime Misclassification — The HMM assigns the wrong regime to a period, causing detrimental allocation; errors spike at window edges and during oscillating markets.
- State-Space Design — The choice of which variables enter the MDP state is unprincipled; a poor state vector breaks the Markov property or invites overfitting.
- Buy-and-Hold Benchmark — The passive baseline of holding an asset for the whole period; the profitability bar a Markov-chain trading strategy must beat net of costs.
- Random Walk Benchmark — The naive baseline that next-period price moves are unpredictable; the minimum bar any Markov-chain price predictor must clear.
People (researched — rounds 2–7)
- Adrian Millea — Imperial College London researcher; author of the critical survey that documents the reproducibility and consistency deficit in deep-RL-t…
- Andrea Frazzini — Quantitative researcher at AQR Capital Management, co-author of empirical work measuring real-world trading costs of asset-pricing anomal…
- Antoni Wiliński — Polish professor of computer science / quantitative finance; author of the heterogeneous-transition-matrix Markov chain trading paper (Wi…
- Campbell Harvey — Finance professor at Duke University and former editor of the Journal of Finance; leading researcher on multiple testing and the ‘factor…
- David H. Bailey — Mathematician (Lawrence Berkeley National Laboratory, UC Davis); co-author of the backtest-overfitting and Deflated Sharpe Ratio literature.
- Halbert White — Econometrician (UC San Diego, 1950-2012); author of the ‘Reality Check for Data Snooping’, the first general test correcting for specific…
- James D. Hamilton — Econometrician (UC San Diego) whose 1989 Econometrica paper founded the Markov regime-switching model family; author of the standard Time…
- John Moody — Computational-finance and machine-learning researcher; originator, with collaborators, of recurrent reinforcement learning (RRL) for trad…
- John Mulvey — Princeton ORFE professor and co-author of the Shu, Yu & Mulvey regime-switching asset-allocation papers using statistical jump models.
- Marcos López de Prado — Quant researcher (ADIA, formerly Guggenheim Partners; Cornell) who formalised backtest overfitting, the Deflated Sharpe Ratio and combina…
- Mark Kritzman — Named quant practitioner — CEO of Windham Capital, MIT Sloan lecturer; co-developed the financial turbulence index and Markov regime-shif…
- Matthew Saffell — Machine-learning researcher; co-author with John Moody of the foundational recurrent reinforcement learning trading papers and the Differ…
- Neil Chriss — Mathematician and quantitative-finance practitioner; co-author with Robert Almgren of the foundational 2000 optimal-execution paper.
- Peter Nystrup — Quantitative researcher (DTU / Lund) who originated the statistical jump model for persistent-regime detection and its sparse, feature-se…
- Petter Kolm — Quantitative finance professor at NYU Courant; co-author across the sparse and continuous statistical jump model papers connecting the ju…
- Richard Bellman — Mathematician who founded dynamic programming and the Bellman equation — the solution machinery underneath every MDP.
- Robert Almgren — Applied mathematician and market-microstructure researcher; co-originator of the Almgren-Chriss optimal-execution framework and of direct…
- Tobias Moskowitz — Financial economist (Yale School of Management, NBER, AQR), co-author of empirical work on the trading costs of asset-pricing anomalies.
- Valeriy Zakamulin — Finance professor at the University of Agder; quantitative researcher and skeptic of market-timing and moving-average trading claims.
- Yan Liu — Finance academic; co-author with Campbell Harvey of the multiple-testing framework for evaluating trading strategies and the ‘factor zoo’…
People (researched — round 9)
- Cliff Asness — Co-founder of AQR Capital Management; author of ‘The Siren Song of Factor Timing’, the vault’s reference skeptic on factor timing.
Organisations (researched — round 4)
- Quoniam Asset Management — Frankfurt-based quantitative asset manager; affiliation of co-author Sascha Mergner on Bulla et al. 2010.
Organisations (researched — rounds 6–7)
- AQR Capital Management — Large quant manager whose live-trade data underpins the leading real-world cost estimate, and which publicly discloses skepticism toward…
- BlackRock — World’s largest asset manager; publicly frames a ‘new regime’ and runs regime-adaptive macro ML models, but discloses these as risk-aware…
- Bridgewater Associates — World’s largest hedge fund; its publicly disclosed economic-environment (‘four boxes’) framework is regime classification used for portfo…
- CFA Institute — Global association of investment professionals; publisher of the Financial Analysts Journal — a peer-reviewed practitioner-academic venue…
- Citigroup — Global investment bank whose US equity trading-desk data was used to empirically estimate the equity market-impact functions in Almgren e…
- State Street Associates — Research arm of State Street Global Markets; institutional source of named-quant regime, turbulence and business-cycle research feeding t…
Organisations (researched — round 9)
- Man Group — Listed alternative-investment manager; its Man AHL / Man Numeric quant units publish named-author regime research (the ‘Regimes’ paper, M…
- PIMCO — Global fixed-income manager; publicly organises its process around macroeconomic regimes via the Secular/Cyclical Outlook framework — a q…
- Two Sigma — Quantitative hedge fund built on machine learning; publishes regime/factor-environment commentary (Street View, Venn factor lens) but dis…
Concepts (researched — rounds 2–10)
- AI Hedge Fund Index Underperformance — The Eurekahedge AI Hedge Fund Index of ML-driven funds has underperformed passive benchmarks in live data — the aggregate real-money evid…
- Backtest-to-Live Performance Gap — The systematic, documented deterioration of trading strategies between published backtest and live deployment — the reason this vault tre…
- Bellman Equation — Recursive optimality relation expressing a state’s optimal value as best immediate reward plus discounted value of the successor state.
- Combinatorial Purged Cross-Validation — López de Prado’s CV method that purges leaking samples, embargoes adjacent ones, and combinatorially builds many backtest paths to yield…
- Currency Carry Trade — FX strategy of borrowing low-interest-rate currencies to fund high-interest-rate ones; profitable on average but exposed to sudden negati…
- Deflated Sharpe Ratio — Bailey & López de Prado’s adjusted Sharpe ratio that deflates an observed result for the number of trials and for non-normal returns, exp…
- Differential Sharpe Ratio — An online, per-step approximation of the Sharpe ratio built from exponential moving averages of returns; the standard risk-adjusted rewar…
- Dynamic Programming — Bellman’s method for solving multi-stage decision problems by recursion on the value function — the classical, model-based way to solve a…
- Factor Timing — Dynamically tilting a portfolio toward equity factors expected to outperform — a practice whose reliability is contested (‘the siren song’).
- Financial Turbulence Index — Mahalanobis-distance measure of statistical ‘unusualness’ in asset returns (Kritzman & Li 2010); a regime/outlier risk indicator used to…
- Goodhart’s Law — When a measure becomes a target it ceases to be a good measure — the principle behind reward and metric misspecification.
- Hawkes Process — A self-exciting point process whose intensity depends on the history of past events — the standard non-Markovian model of clustered order…
- Implementation Shortfall — The gap between a trade’s notional value at the decision (arrival) price and the value actually achieved after impact, fees and timing —…
- Live Regime-Model Evidence Gap — The gap between disclosed institutional use of regime CLASSIFICATION (real) and any public live track record of a standalone profitable M…
- Live Trading Evidence — Disclosed, costed, real-capital track records — the missing top tier of trading-strategy evidence; no credible live Markov-model record h…
- Maximum Drawdown — The largest peak-to-trough loss of an equity curve; the risk denominator of the Calmar ratio and a core backtest risk metric.
- Maximum Likelihood Estimation — Estimating model parameters by maximising the probability the model assigns to the observed data; the principle Baum-Welch implements for…
- Median Filter Smoothing — Post-processing step that smooths a noisy regime-signal sequence to suppress short-lived flips and reduce turnover.
- Out-of-Sample Backtesting — Evaluating a trading model on data not used to fit it — the imperfect primary defence against overfitting and the core evidence standard…
- Recent Developments 2024-2025 — 2024-2025 work on neural regime detection, FinRL benchmarks and generative RL — better tooling and honest benchmarks, but the same unsolv…
- Regime Classification — Labelling the market’s current latent state (bull/bear, calm/crisis, low/high-vol) — well-evidenced as a risk filter, far weaker as a sta…
- Replication Crisis in Quantitative Finance — Difficulty independently re-running and confirming published quantitative-trading results — driven by missing code, undisclosed settings,…
- Square-Root Law of Market Impact — The empirically observed regularity that the price impact of a large metaorder scales as the square root of order size relative to traded…
- Steelman — Best Case for Markov Trading Models — The strongest honest case FOR Markov/HMM/regime/RL trading profitability — collected to stress-test the vault’s skeptical conclusion; the…
- Tactical Asset Allocation — Actively tilting portfolio exposure on short-horizon signals (often regime or trend) to avoid tail risk — the practical setting where reg…
- Value Iteration — Dynamic-programming algorithm that solves an MDP by repeatedly applying the Bellman optimality operator until the value function converges.
- Volatility Regime — A market regime defined by the variance of returns (low-variance vs high-variance) rather than by directional return sign.
General (researched — round 10)
- Conclusion — Verdict: Markov-based trading models are a useful regime/risk-management component, not a substantiated standalone profitable trading sys…
Backtest Results (researched — round 9)
- Calmar Ratio — Risk-adjusted performance metric (annualised return over maximum drawdown) used by Wiliński 2019 to report Markov-chain trading profitabi…
Datasets (researched — round 4)
- QuantConnect — Open algorithmic-backtesting and trading platform used to build and backtest the factor models in Wang Lin Mikhelson 2020.
Datasets (researched — round 8)
- Pinnacle Data Corp CLC Database — Commercial database of ratio-adjusted continuous futures contracts; the 50-contract universe used in the vault’s strongest positive RL-tr…
Datasets (researched — round 10)
- FinRL Framework and Contests — Open-source deep-RL trading framework and its 2023-2025 benchmarking contests; standardised MDP environments and out-of-sample evaluation…
Markets (researched — round 6)
- Limit Order Book — The order-matching mechanism of modern electronic markets — the environment most MDP execution and market-making models trade within, and…
Markets (researched — round 8)
- Cryptocurrency Market — Volatile 24/7 digital-asset market; the dominant RL/Markov trading testbed, where backtest overfitting inflates positive results and cras…
- DAX — German blue-chip equity index; a deeply liquid non-US robustness market for HMM regime-switching strategies.
- EUR USD Currency Pair — The euro/US-dollar pair — the single most liquid market in the world; the canonical low-cost test ground for Markov-chain and RL FX tradi…
- Futures Markets — Exchange-traded standardised futures across commodities, equity indices, fixed income and FX — the test universe for the strongest positi…
- Nikkei 225 — Japanese equity index; the deep-liquidity test market where HMM regime filtering most clearly improved Sharpe by avoiding Japan’s prolong…
- OMXS30 Index — Benchmark index of the 30 most-traded Nasdaq Stockholm stocks; a smaller developed-market test bed where higher relative costs would erod…
- S&P 500 — US large-cap equity index; the deepest-liquidity, longest-history test market for HMM regime-detection and regime-switching trading studies.
- WIG20 Index — Benchmark index of 20 large Warsaw Stock Exchange companies; a smaller emerging-market equity testbed where higher costs undercut Wilinsk…
Models (researched — round 3)
- Statistical Jump Model — Non-parametric regime model that enforces persistence via an explicit jump penalty; calmer regimes and lower turnover than the HMM, but i…
Models (researched — rounds 9–10)
- Baum-Welch Estimation — The Expectation-Maximisation algorithm that fits HMM transition and emission parameters by maximum likelihood from an observed return seq…
- Forward-Backward Algorithm — Dynamic-programming recursion computing per-state posterior probabilities; the inference engine inside Baum-Welch’s E-step.
- Hidden Semi-Markov Model — Generalisation of the HMM that models explicit state-duration (sojourn) distributions instead of assuming geometric regime lengths.
- K-Means Regime Clustering — Unsupervised clustering (k-means and fuzzy variants) used to partition market or macroeconomic data into regimes — a machine-learning alt…
- Neural Regime Model — 2024-2025 neural-network-augmented regime detectors — neural HMMs, attention/transformer regime models, LLM-assisted classifiers; richer…
- Similarity-Based Regime Detection — Non-parametric regime detection that classifies the current period by distance/similarity to historical periods, without presupposing a f…
- Viterbi Decoding — Dynamic-programming algorithm that recovers the single most probable global sequence of hidden regimes from a fitted HMM.
Papers (researched — rounds 2–10)
- Abbade and Reali Costa 2026 — Builds RL trading environments with nonlinear market-impact models; shows flat 10bp fees inflate backtests and the cost model changes whi…
- Almgren Chriss 2000 — Foundational optimal-execution model: liquidate a fixed position over a horizon as a mean-variance trade-off between market-impact cost a…
- Almgren Thum Hauptmann Li 2005 — Empirically fits a permanent+temporary equity market-impact model to Citigroup trade data; rejects pure square-root for a 3/5 power law o…
- And the Cross-Section of Expected Returns — Documents the ‘factor zoo’ and shows the conventional t>2.0 significance hurdle is far too lax under collective data snooping.
- Ang and Timmermann 2012 — Ang & Timmermann’s canonical survey of regime-switching in finance — defines regimes and frames their portfolio value as risk management,…
- Aronsson Folkesson 2023 — KTH 2023 BSc degree project applying first/second-order discrete Markov chains to OMXS30 for next-day state prediction; barely beats rand…
- Avellaneda-Stoikov 2008 — Foundational stochastic-control model of market making: inventory-skewed reservation prices and a spread calibrated to Poisson order arri…
- Aydınhan Kolm Mulvey Shu 2024 — Extends the jump model to a continuous probability-vector regime representation (CJM) with a mode-loss penalty; methodology extension by…
- Baitinger & Hoch 2024 — Baitinger & Hoch (2024): the complex HSMM beats the plain HMM only in-sample; the advantage ‘largely disappears’ out-of-sample, and more…
- Bandarupalli 2025 — Cost-aware out-of-sample PPO RL agent on BTC/ETH/SPY underperformed buy-and-hold (Sharpe 1.23 vs 1.46) — RL crypto trading fails to beat…
- Bazzi Blasques Koopman Lucas 2017 — Methodological paper proposing score-driven time-varying transition probabilities for Markov regime-switching models — the econometric re…
- Borrageiro Firoozye Barucca 2022 — Modern peer-reviewed direct-RRL FX agent; 7-year out-of-sample test net of transaction and funding costs yields only a 0.52 information r…
- Bosancic Nie Mulvey 2024 — Regime-aware factor allocation paper that applies statistical jump models with optimal feature selection to long-only equity factor portf…
- Bouye and Teiletche 2025 — CFA Institute / Financial Analysts Journal 2025 paper modelling economic regimes as mixtures of distributions for strategic asset allocat…
- Brunnermeier Nagel Pedersen 2008 — NBER study showing FX carry-trade returns are negatively skewed and crash-prone — investment currencies fall sharply when funding liquidi…
- Bulla et al. 2010 — Bulla et al.’s out-of-sample test of a Markov-switching equity timing strategy; profitable after 10bp costs, but the edge is mostly ~41%…
- Chappell 2018 — Among the first papers to apply Markov regime-switching/HMM to Bitcoin returns; a 5-state model best fits crypto volatility regimes — reg…
- Chen and Zimmermann 2022 — Open-source replication of 319 cross-sectional return predictors; reproduces nearly all in-sample, and (with later work) confirms sharp p…
- Cortese Kolm Lindström 2023 — Applies the sparse statistical jump model to cryptocurrency returns; identifies a three-state bull/neutral/bear structure and its drivers…
- Dacco and Satchell 1999 — Dacco & Satchell prove analytically that a small real-time regime-misclassification rate makes even the TRUE regime-switching model forec…
- Ding Granger Engle 1993 — Canonical empirical paper documenting long-memory dependence in absolute stock returns — the persistence a memoryless first-order Markov…
- Do Carmo 2017 — 2017 Lisbon master’s dissertation reconstructing equity return processes as N-symbol K-length discrete Markov chains; beats random on an…
- Frazzini Israel and Moskowitz 2018 — Measures real-world trading costs from $1tn of live AQR trades; finds costs ~9-11bp and that size/value/momentum survive, short-term reve…
- Goldfeld and Quandt 1973 — Goldfeld & Quandt’s 1973 paper introducing Markov-switching regression — the direct precursor Hamilton 1989 extended to autoregressive mo…
- Gort et al. 2022 — Hypothesis-test framework to reject overfitted DRL crypto agents via combinatorial cross-validation; the least-overfitted accepted PPO ag…
- Hambly Xu Yang 2023 — Survey establishing the Markov Decision Process as the standard setting for reinforcement learning across finance decision problems.
- Hamilton 1989 — Hamilton’s 1989 Econometrica paper introducing autoregressive models with Markov-switching parameters — the founding paper of the regime-…
- Kritzman and Li 2010 — Kritzman & Li’s FAJ paper defining the financial turbulence index — a Mahalanobis-distance outlier measure used to scale risk, not a trad…
- Kritzman Page Turkington 2012 — CFA Institute / FAJ 2012 paper applying Markov-switching models to forecast regimes in turbulence, inflation and growth for dynamic asset…
- Lalor Swishchuk 2025 — Deep-RL market-making study under semi-Markov and Hawkes dynamics — names the non-Markov problem and the phantom-gain backtest artefacts…
- Macri et al 2025 — Macri et al. show that embedding latent-Markov-regime probabilities into a DDPG+GRU RL trader improves profit and robustness — but the ev…
- Macroeconomic Regimes and Regime Shifts — Hamilton’s NBER WP 21863 survey of regime-switching econometrics — formulation, filtered vs smoothed inference, regime-count testing, mul…
- McLean and Pontiff 2016 — Replicates 97 published return predictors; finds anomaly returns 26% lower out-of-sample and 58% lower post-publication — the headline me…
- Mettle et al 2024 — Peer-reviewed 2024 paper modelling monthly returns of five national markets as a Markov chain random walk for descriptive risk ranking; n…
- Millea 2021 — Critical survey of 152 deep-RL-trading papers; documents methodological fragmentation and a reproducibility deficit that block aggregatin…
- Moody and Saffell 2001 — Foundational 2001 IEEE paper introducing recurrent reinforcement learning (RRL) and the Differential Sharpe Ratio for trading; encouragin…
- Moody Wu Liao Saffell 1998 — Earliest full recurrent reinforcement learning trading paper; introduced the Differential Sharpe Ratio and the S&P 500 1970-1994 out-of-s…
- Mulliner et al. 2025 — Man Group / Harvey ‘Regimes’ paper: a non-parametric similarity-based regime model times six equity factors; reports a 0.82-Sharpe anti-r…
- Nasir et al 2021 — Builds a model-based MDP for American-option trading, solved by value iteration on Microsoft and Coca-Cola options data.
- Nystrup Kolm Lindström 2021 — Introduces the sparse statistical jump model — joint feature selection plus jump-penalised clustering — robust to noise; a methodology pa…
- Nystrup Lindström Madsen 2020 — Originating paper for the statistical jump model — a jump-penalised HMM estimator; shows better persistence estimates cut trading costs,…
- Oliveira et al. 2025 — Oliveira et al. classify regimes by modified k-means over FRED-MD macro data and report regime-conditioned ETF portfolios beating SPY — b…
- Patton and Weller 2017 — Measures the gap between on-paper factor returns and what mutual funds deliver; implementation costs 2.2-8.5%/yr and momentum net return…
- Paykan et al. 2025 — DRL (SAC/DDPG+LSTM) crypto portfolio agent beats a Markowitz benchmark but with a -40.9% drawdown and no hard buy-and-hold benchmark — a…
- Pedersen 2023 — Compares analytical, numerical, dynamic-programming and RL solutions to the Almgren-Chriss optimal-execution MDP.
- Probability of Backtest Overfitting — Introduces PBO and the CSCV method that estimates the probability a backtest is overfit from only a matrix of returns.
- Pseudo-Mathematics and Financial Charlatanism — Foundational proof that backtest overfitting is near-unavoidable and, under memory effects, produces negative out-of-sample returns.
- Quandt 1958 1972 — Quandt’s switching-regression papers (1958, 1972) — the earliest two-regime regression models that pre-date and feed into the Markov regi…
- Shu and Mulvey 2024 Dynamic Factor Allocation — Shu & Mulvey use a sparse jump model + Black-Litterman to dynamically allocate across six factor ETFs; OOS 2007-2024 with costs, IR rises…
- Shu Yu and Mulvey 2024 — Shu, Yu & Mulvey benchmark HMM vs statistical jump model regime signals on equity indices 1990-2023 with costs; the jump model wins on tu…
- Shu Yu and Mulvey 2024 Dynamic Allocation — Shu, Yu & Mulvey’s 12-asset regime-aware allocation paper; jump-model + XGBoost regime forecasts lift OOS Sharpe and halve drawdown after…
- Stop-Loss Regime Switching 2018 — Elsevier paper deriving closed-form returns of stop-loss/regime-exit rules; tight rules underperform buy-and-hold on US stocks due to exc…
- Sun Wang An 2021 — Comprehensive survey of 100+ RL-for-quantitative-trading papers; builds a four-task taxonomy and states the field’s unsolved open challen…
- The 10 Reasons Most Machine Learning Funds Fail — López de Prado’s practitioner catalogue of ten recurring mistakes — research-through-backtesting, walk-forward overfitting, undisclosed t…
- Wang Lin Mikhelson 2020 — Wang, Lin & Mikhelson (2020): a 3-state HMM rotates style-factor models; reports a 2.0 Sharpe out-of-sample, but with no costs, leverage,…
- Wilinski 2019 — Expert Systems with Applications 2019 paper: heterogeneous rolling-window Markov chain; peer-reviewed profit claim but ML-tuned, no discl…
- Zakamulin 2016 — Zakamulin shows a celebrated moving-average timing strategy’s alpha is pure look-ahead bias; corrected, it is no better than buy-and-hold.
- Zhang Zohren Roberts 2019 — Oxford-Man Institute study training DQN/PG/A2C agents on 50 futures contracts; reports out-of-sample risk-adjusted outperformance over mo…
Risks and Failure Modes (researched — round 2)
- Data-Snooping Bias — Inflated apparent performance from testing many models/strategies on the same data and reporting only the winners; the multiple-testing f…
- Overfitting in Quantitative Trading — Fitting a model to noise in historical data so a backtest looks profitable but the edge fails out-of-sample; under serial dependence it p…
- Transaction Costs and Slippage — Commissions, spreads, market impact and slippage that erode or erase the gross profits of Markov trading models, especially high-turnover…
Risks and Failure Modes (researched — rounds 6–7)
- Adverse Selection — The market maker’s risk of being systematically filled by better-informed counterparties — the core cost of liquidity provision and a rou…
- Phantom Gains in Backtests — Inflated backtest P&L produced when a market-making simulator omits adverse fills, queue position, or bid/ask (not mid-price) execution.
- Sim-to-Real Gap — The discrepancy between an agent’s performance in a historical-data simulation and in a live market — the reinforcement-learning-specific…
Risks and Failure Modes (researched — rounds 9–10)
- First-Order Memory Assumption — The Markov property assumption that the next price state depends only on the current state, discarding trend and longer-range dependence.
- HMM Parameter Instability — HMM parameter estimates are noisy and unstable on short or imbalanced samples, especially data-starved bear regimes, degrading regime inf…
- Non-Stationary Transition Matrix — The core failure mode of Markov-chain trading: estimated transition probabilities drift over time, so a fixed matrix misforecasts the nex…
- Reward Design Sensitivity — RL trading results depend heavily on the choice of reward function and hyperparameters, making reported edges fragile and easy to overfit.
- Reward Specification Error — A mis-specified MDP reward is silently optimised, producing policies that maximise the proxy rather than genuine risk-adjusted profit.
- State Definition Arbitrariness — Markov-chain results depend heavily on the arbitrary choice of how many price states to use and where to set the bucket thresholds.
- State-Count Selection — Choosing how many hidden regimes the HMM should have is unstable; statistical criteria conflict with trading utility and more states rare…
Strategies (researched — rounds 3–6)
- 0-1 Strategy — Binary regime-timing rule: hold 100% of a risky asset in the forecast bull/low-volatility regime, 100% cash otherwise.
- Market Making — Continuously quoting bid and ask prices to earn the spread while managing inventory and adverse-selection risk — a canonical sequential d…
- Optimal Execution — Trading a fixed, already-decided position over a horizon to minimise market-impact cost while controlling timing risk — the MDP applicati…
- Recurrent Reinforcement Learning Trading — Direct-reinforcement trading: a recurrent policy outputs positions and is trained by gradient ascent to maximise a risk-adjusted performa…
- Regime-Based Asset Allocation — Tactical strategy that shifts between risky assets and cash/bonds on a detected market regime — the main real trading use of regime-switc…
- Time Series Momentum — Trend-following strategy taking the sign of an asset’s own past return as its position; the standard baseline RL trading papers benchmark…
- VWAP and TWAP Execution — Workhorse execution algorithms — trading in proportion to volume (VWAP) or at a constant rate over time (TWAP) — that are limiting cases…
Strategies (researched — round 9)
- Style Factor Rotation — Switching a portfolio’s exposure among style-factor models (value, momentum, quality, etc.) according to a detected market regime.