markov-model — Vault Index

Research Goal

Goal: Determine whether Markov-based trading models can be substantiated as profitable trading approaches, separating genuine tradeable edge from academic backtest artefacts, regime-classification usefulness, overfitting, and marketing claims.

Goal & Outcomes — Summary

Verdict — research complete (10 rounds, high confidence)

Markov-based models are substantiated as a regime-detection and risk-management component, not as standalone profitable trading systems. Regime classification reliably cuts volatility and drawdown; converting it into directional alpha that beats a benchmark after costs is not substantiated. No surveyed study reaches a strong evidence grade and no credible public live track record exists — a deliberate steelman search did not overturn this.

Research questionCan Markov-based trading models produce profitable, repeatable outcomes after realistic costs and robustness checks?
AnswerUseful component (regime detection / risk control); not a proven standalone alpha source.
Models in scopeMarkov chains · Hidden Markov Models · Markov regime-switching · Markov Decision Processes · reinforcement-learning policies
Definitive answerConclusion
Goal & Outcomes pageOutcomes

Conclusion — the definitive, fully-argued verdict. · → Outcomes — round-by-round findings and evidence by model family.

Seed entities (round 1)

  • Major Equity Indices Daily Returns — Daily return series of major US, German and Japanese equity indices (S&P 500, DJIA, NASDAQ 100, DAX, Nikkei 225) used to backtest regime-…
  • Options Market — The market for exchange-traded options — the asset class for the American-option MDP trading model of Nasir et al.
  • US Equity Market — The US stock market — the venue for the portfolio-transaction execution problems studied via MDP and dynamic programming.
  • Hidden Markov Model Regime Detection — HMM infers hidden bull/bear or low/high-vol market regimes from observed returns; mostly a risk filter, with weak standalone alpha after…
  • Markov Chain Trading Model — Discrete-state Markov chain estimating a price-state transition matrix to predict the next up/flat/down state directly as a trading signal.
  • Markov Decision Process Trading Model — Frames trading and execution as a sequential decision problem — state, action, reward, transition, policy — solved by dynamic programming.
  • Markov Regime-Switching Model — Econometric model (Hamilton 1989 lineage) whose AR/volatility parameters switch with a latent Markov state; used for regime detection and…
  • Reinforcement Learning Trading Policy — RL agents that learn a trading policy by trial-and-error reward maximisation; strong backtest literature but weak post-cost, out-of-sampl…
  • Curse of Dimensionality — Exact dynamic-programming solution of an MDP becomes computationally infeasible as state and action spaces grow.
  • Lookahead Bias from Smoothed Regime Estimates — Using full-sample smoothed regime labels in a backtest leaks future data into past decisions, inflating apparent profitability.
  • Non-Stationarity — Market dynamics change over time, so a fixed MDP transition kernel and learned policy degrade out-of-sample.
  • Parameter Instability and Estimation Noise — Regime-switching model parameters are estimated from few in-regime observations and a multi-modal likelihood, making them noisy and unsta…
  • Partial Observability — Latent variables make the true MDP state unobservable, breaking the Markov property and turning the problem into a POMDP.
  • Real-Time Regime Identification Lag — Real-time (filtered) regime inference lags ex-post (smoothed) inference by days to weeks; small misclassification erases any forecasting…
  • Regime Misclassification — The HMM assigns the wrong regime to a period, causing detrimental allocation; errors spike at window edges and during oscillating markets.
  • State-Space Design — The choice of which variables enter the MDP state is unprincipled; a poor state vector breaks the Markov property or invites overfitting.
  • Buy-and-Hold Benchmark — The passive baseline of holding an asset for the whole period; the profitability bar a Markov-chain trading strategy must beat net of costs.
  • Random Walk Benchmark — The naive baseline that next-period price moves are unpredictable; the minimum bar any Markov-chain price predictor must clear.

People (researched — rounds 2–7)

  • Adrian Millea — Imperial College London researcher; author of the critical survey that documents the reproducibility and consistency deficit in deep-RL-t…
  • Andrea Frazzini — Quantitative researcher at AQR Capital Management, co-author of empirical work measuring real-world trading costs of asset-pricing anomal…
  • Antoni Wiliński — Polish professor of computer science / quantitative finance; author of the heterogeneous-transition-matrix Markov chain trading paper (Wi…
  • Campbell Harvey — Finance professor at Duke University and former editor of the Journal of Finance; leading researcher on multiple testing and the ‘factor…
  • David H. Bailey — Mathematician (Lawrence Berkeley National Laboratory, UC Davis); co-author of the backtest-overfitting and Deflated Sharpe Ratio literature.
  • Halbert White — Econometrician (UC San Diego, 1950-2012); author of the ‘Reality Check for Data Snooping’, the first general test correcting for specific…
  • James D. Hamilton — Econometrician (UC San Diego) whose 1989 Econometrica paper founded the Markov regime-switching model family; author of the standard Time…
  • John Moody — Computational-finance and machine-learning researcher; originator, with collaborators, of recurrent reinforcement learning (RRL) for trad…
  • John Mulvey — Princeton ORFE professor and co-author of the Shu, Yu & Mulvey regime-switching asset-allocation papers using statistical jump models.
  • Marcos López de Prado — Quant researcher (ADIA, formerly Guggenheim Partners; Cornell) who formalised backtest overfitting, the Deflated Sharpe Ratio and combina…
  • Mark Kritzman — Named quant practitioner — CEO of Windham Capital, MIT Sloan lecturer; co-developed the financial turbulence index and Markov regime-shif…
  • Matthew Saffell — Machine-learning researcher; co-author with John Moody of the foundational recurrent reinforcement learning trading papers and the Differ…
  • Neil Chriss — Mathematician and quantitative-finance practitioner; co-author with Robert Almgren of the foundational 2000 optimal-execution paper.
  • Peter Nystrup — Quantitative researcher (DTU / Lund) who originated the statistical jump model for persistent-regime detection and its sparse, feature-se…
  • Petter Kolm — Quantitative finance professor at NYU Courant; co-author across the sparse and continuous statistical jump model papers connecting the ju…
  • Richard Bellman — Mathematician who founded dynamic programming and the Bellman equation — the solution machinery underneath every MDP.
  • Robert Almgren — Applied mathematician and market-microstructure researcher; co-originator of the Almgren-Chriss optimal-execution framework and of direct…
  • Tobias Moskowitz — Financial economist (Yale School of Management, NBER, AQR), co-author of empirical work on the trading costs of asset-pricing anomalies.
  • Valeriy Zakamulin — Finance professor at the University of Agder; quantitative researcher and skeptic of market-timing and moving-average trading claims.
  • Yan Liu — Finance academic; co-author with Campbell Harvey of the multiple-testing framework for evaluating trading strategies and the ‘factor zoo’…

People (researched — round 9)

  • Cliff Asness — Co-founder of AQR Capital Management; author of ‘The Siren Song of Factor Timing’, the vault’s reference skeptic on factor timing.

Organisations (researched — round 4)

  • Quoniam Asset Management — Frankfurt-based quantitative asset manager; affiliation of co-author Sascha Mergner on Bulla et al. 2010.

Organisations (researched — rounds 6–7)

  • AQR Capital Management — Large quant manager whose live-trade data underpins the leading real-world cost estimate, and which publicly discloses skepticism toward…
  • BlackRock — World’s largest asset manager; publicly frames a ‘new regime’ and runs regime-adaptive macro ML models, but discloses these as risk-aware…
  • Bridgewater Associates — World’s largest hedge fund; its publicly disclosed economic-environment (‘four boxes’) framework is regime classification used for portfo…
  • CFA Institute — Global association of investment professionals; publisher of the Financial Analysts Journal — a peer-reviewed practitioner-academic venue…
  • Citigroup — Global investment bank whose US equity trading-desk data was used to empirically estimate the equity market-impact functions in Almgren e…
  • State Street Associates — Research arm of State Street Global Markets; institutional source of named-quant regime, turbulence and business-cycle research feeding t…

Organisations (researched — round 9)

  • Man Group — Listed alternative-investment manager; its Man AHL / Man Numeric quant units publish named-author regime research (the ‘Regimes’ paper, M…
  • PIMCO — Global fixed-income manager; publicly organises its process around macroeconomic regimes via the Secular/Cyclical Outlook framework — a q…
  • Two Sigma — Quantitative hedge fund built on machine learning; publishes regime/factor-environment commentary (Street View, Venn factor lens) but dis…

Concepts (researched — rounds 2–10)

  • AI Hedge Fund Index Underperformance — The Eurekahedge AI Hedge Fund Index of ML-driven funds has underperformed passive benchmarks in live data — the aggregate real-money evid…
  • Backtest-to-Live Performance Gap — The systematic, documented deterioration of trading strategies between published backtest and live deployment — the reason this vault tre…
  • Bellman Equation — Recursive optimality relation expressing a state’s optimal value as best immediate reward plus discounted value of the successor state.
  • Combinatorial Purged Cross-Validation — López de Prado’s CV method that purges leaking samples, embargoes adjacent ones, and combinatorially builds many backtest paths to yield…
  • Currency Carry Trade — FX strategy of borrowing low-interest-rate currencies to fund high-interest-rate ones; profitable on average but exposed to sudden negati…
  • Deflated Sharpe Ratio — Bailey & López de Prado’s adjusted Sharpe ratio that deflates an observed result for the number of trials and for non-normal returns, exp…
  • Differential Sharpe Ratio — An online, per-step approximation of the Sharpe ratio built from exponential moving averages of returns; the standard risk-adjusted rewar…
  • Dynamic Programming — Bellman’s method for solving multi-stage decision problems by recursion on the value function — the classical, model-based way to solve a…
  • Factor Timing — Dynamically tilting a portfolio toward equity factors expected to outperform — a practice whose reliability is contested (‘the siren song’).
  • Financial Turbulence Index — Mahalanobis-distance measure of statistical ‘unusualness’ in asset returns (Kritzman & Li 2010); a regime/outlier risk indicator used to…
  • Goodhart’s Law — When a measure becomes a target it ceases to be a good measure — the principle behind reward and metric misspecification.
  • Hawkes Process — A self-exciting point process whose intensity depends on the history of past events — the standard non-Markovian model of clustered order…
  • Implementation Shortfall — The gap between a trade’s notional value at the decision (arrival) price and the value actually achieved after impact, fees and timing —…
  • Live Regime-Model Evidence Gap — The gap between disclosed institutional use of regime CLASSIFICATION (real) and any public live track record of a standalone profitable M…
  • Live Trading Evidence — Disclosed, costed, real-capital track records — the missing top tier of trading-strategy evidence; no credible live Markov-model record h…
  • Maximum Drawdown — The largest peak-to-trough loss of an equity curve; the risk denominator of the Calmar ratio and a core backtest risk metric.
  • Maximum Likelihood Estimation — Estimating model parameters by maximising the probability the model assigns to the observed data; the principle Baum-Welch implements for…
  • Median Filter Smoothing — Post-processing step that smooths a noisy regime-signal sequence to suppress short-lived flips and reduce turnover.
  • Out-of-Sample Backtesting — Evaluating a trading model on data not used to fit it — the imperfect primary defence against overfitting and the core evidence standard…
  • Recent Developments 2024-2025 — 2024-2025 work on neural regime detection, FinRL benchmarks and generative RL — better tooling and honest benchmarks, but the same unsolv…
  • Regime Classification — Labelling the market’s current latent state (bull/bear, calm/crisis, low/high-vol) — well-evidenced as a risk filter, far weaker as a sta…
  • Replication Crisis in Quantitative Finance — Difficulty independently re-running and confirming published quantitative-trading results — driven by missing code, undisclosed settings,…
  • Square-Root Law of Market Impact — The empirically observed regularity that the price impact of a large metaorder scales as the square root of order size relative to traded…
  • Steelman — Best Case for Markov Trading Models — The strongest honest case FOR Markov/HMM/regime/RL trading profitability — collected to stress-test the vault’s skeptical conclusion; the…
  • Tactical Asset Allocation — Actively tilting portfolio exposure on short-horizon signals (often regime or trend) to avoid tail risk — the practical setting where reg…
  • Value Iteration — Dynamic-programming algorithm that solves an MDP by repeatedly applying the Bellman optimality operator until the value function converges.
  • Volatility Regime — A market regime defined by the variance of returns (low-variance vs high-variance) rather than by directional return sign.

General (researched — round 10)

  • Conclusion — Verdict: Markov-based trading models are a useful regime/risk-management component, not a substantiated standalone profitable trading sys…

Backtest Results (researched — round 9)

  • Calmar Ratio — Risk-adjusted performance metric (annualised return over maximum drawdown) used by Wiliński 2019 to report Markov-chain trading profitabi…

Datasets (researched — round 4)

  • QuantConnect — Open algorithmic-backtesting and trading platform used to build and backtest the factor models in Wang Lin Mikhelson 2020.

Datasets (researched — round 8)

  • Pinnacle Data Corp CLC Database — Commercial database of ratio-adjusted continuous futures contracts; the 50-contract universe used in the vault’s strongest positive RL-tr…

Datasets (researched — round 10)

  • FinRL Framework and Contests — Open-source deep-RL trading framework and its 2023-2025 benchmarking contests; standardised MDP environments and out-of-sample evaluation…

Markets (researched — round 6)

  • Limit Order Book — The order-matching mechanism of modern electronic markets — the environment most MDP execution and market-making models trade within, and…

Markets (researched — round 8)

  • Cryptocurrency Market — Volatile 24/7 digital-asset market; the dominant RL/Markov trading testbed, where backtest overfitting inflates positive results and cras…
  • DAX — German blue-chip equity index; a deeply liquid non-US robustness market for HMM regime-switching strategies.
  • EUR USD Currency Pair — The euro/US-dollar pair — the single most liquid market in the world; the canonical low-cost test ground for Markov-chain and RL FX tradi…
  • Futures Markets — Exchange-traded standardised futures across commodities, equity indices, fixed income and FX — the test universe for the strongest positi…
  • Nikkei 225 — Japanese equity index; the deep-liquidity test market where HMM regime filtering most clearly improved Sharpe by avoiding Japan’s prolong…
  • OMXS30 Index — Benchmark index of the 30 most-traded Nasdaq Stockholm stocks; a smaller developed-market test bed where higher relative costs would erod…
  • S&P 500 — US large-cap equity index; the deepest-liquidity, longest-history test market for HMM regime-detection and regime-switching trading studies.
  • WIG20 Index — Benchmark index of 20 large Warsaw Stock Exchange companies; a smaller emerging-market equity testbed where higher costs undercut Wilinsk…

Models (researched — round 3)

  • Statistical Jump Model — Non-parametric regime model that enforces persistence via an explicit jump penalty; calmer regimes and lower turnover than the HMM, but i…

Models (researched — rounds 9–10)

  • Baum-Welch Estimation — The Expectation-Maximisation algorithm that fits HMM transition and emission parameters by maximum likelihood from an observed return seq…
  • Forward-Backward Algorithm — Dynamic-programming recursion computing per-state posterior probabilities; the inference engine inside Baum-Welch’s E-step.
  • Hidden Semi-Markov Model — Generalisation of the HMM that models explicit state-duration (sojourn) distributions instead of assuming geometric regime lengths.
  • K-Means Regime Clustering — Unsupervised clustering (k-means and fuzzy variants) used to partition market or macroeconomic data into regimes — a machine-learning alt…
  • Neural Regime Model — 2024-2025 neural-network-augmented regime detectors — neural HMMs, attention/transformer regime models, LLM-assisted classifiers; richer…
  • Similarity-Based Regime Detection — Non-parametric regime detection that classifies the current period by distance/similarity to historical periods, without presupposing a f…
  • Viterbi Decoding — Dynamic-programming algorithm that recovers the single most probable global sequence of hidden regimes from a fitted HMM.

Papers (researched — rounds 2–10)

  • Abbade and Reali Costa 2026 — Builds RL trading environments with nonlinear market-impact models; shows flat 10bp fees inflate backtests and the cost model changes whi…
  • Almgren Chriss 2000 — Foundational optimal-execution model: liquidate a fixed position over a horizon as a mean-variance trade-off between market-impact cost a…
  • Almgren Thum Hauptmann Li 2005 — Empirically fits a permanent+temporary equity market-impact model to Citigroup trade data; rejects pure square-root for a 3/5 power law o…
  • And the Cross-Section of Expected Returns — Documents the ‘factor zoo’ and shows the conventional t>2.0 significance hurdle is far too lax under collective data snooping.
  • Ang and Timmermann 2012 — Ang & Timmermann’s canonical survey of regime-switching in finance — defines regimes and frames their portfolio value as risk management,…
  • Aronsson Folkesson 2023 — KTH 2023 BSc degree project applying first/second-order discrete Markov chains to OMXS30 for next-day state prediction; barely beats rand…
  • Avellaneda-Stoikov 2008 — Foundational stochastic-control model of market making: inventory-skewed reservation prices and a spread calibrated to Poisson order arri…
  • Aydınhan Kolm Mulvey Shu 2024 — Extends the jump model to a continuous probability-vector regime representation (CJM) with a mode-loss penalty; methodology extension by…
  • Baitinger & Hoch 2024 — Baitinger & Hoch (2024): the complex HSMM beats the plain HMM only in-sample; the advantage ‘largely disappears’ out-of-sample, and more…
  • Bandarupalli 2025 — Cost-aware out-of-sample PPO RL agent on BTC/ETH/SPY underperformed buy-and-hold (Sharpe 1.23 vs 1.46) — RL crypto trading fails to beat…
  • Bazzi Blasques Koopman Lucas 2017 — Methodological paper proposing score-driven time-varying transition probabilities for Markov regime-switching models — the econometric re…
  • Borrageiro Firoozye Barucca 2022 — Modern peer-reviewed direct-RRL FX agent; 7-year out-of-sample test net of transaction and funding costs yields only a 0.52 information r…
  • Bosancic Nie Mulvey 2024 — Regime-aware factor allocation paper that applies statistical jump models with optimal feature selection to long-only equity factor portf…
  • Bouye and Teiletche 2025 — CFA Institute / Financial Analysts Journal 2025 paper modelling economic regimes as mixtures of distributions for strategic asset allocat…
  • Brunnermeier Nagel Pedersen 2008 — NBER study showing FX carry-trade returns are negatively skewed and crash-prone — investment currencies fall sharply when funding liquidi…
  • Bulla et al. 2010 — Bulla et al.’s out-of-sample test of a Markov-switching equity timing strategy; profitable after 10bp costs, but the edge is mostly ~41%…
  • Chappell 2018 — Among the first papers to apply Markov regime-switching/HMM to Bitcoin returns; a 5-state model best fits crypto volatility regimes — reg…
  • Chen and Zimmermann 2022 — Open-source replication of 319 cross-sectional return predictors; reproduces nearly all in-sample, and (with later work) confirms sharp p…
  • Cortese Kolm Lindström 2023 — Applies the sparse statistical jump model to cryptocurrency returns; identifies a three-state bull/neutral/bear structure and its drivers…
  • Dacco and Satchell 1999 — Dacco & Satchell prove analytically that a small real-time regime-misclassification rate makes even the TRUE regime-switching model forec…
  • Ding Granger Engle 1993 — Canonical empirical paper documenting long-memory dependence in absolute stock returns — the persistence a memoryless first-order Markov…
  • Do Carmo 2017 — 2017 Lisbon master’s dissertation reconstructing equity return processes as N-symbol K-length discrete Markov chains; beats random on an…
  • Frazzini Israel and Moskowitz 2018 — Measures real-world trading costs from $1tn of live AQR trades; finds costs ~9-11bp and that size/value/momentum survive, short-term reve…
  • Goldfeld and Quandt 1973 — Goldfeld & Quandt’s 1973 paper introducing Markov-switching regression — the direct precursor Hamilton 1989 extended to autoregressive mo…
  • Gort et al. 2022 — Hypothesis-test framework to reject overfitted DRL crypto agents via combinatorial cross-validation; the least-overfitted accepted PPO ag…
  • Hambly Xu Yang 2023 — Survey establishing the Markov Decision Process as the standard setting for reinforcement learning across finance decision problems.
  • Hamilton 1989 — Hamilton’s 1989 Econometrica paper introducing autoregressive models with Markov-switching parameters — the founding paper of the regime-…
  • Kritzman and Li 2010 — Kritzman & Li’s FAJ paper defining the financial turbulence index — a Mahalanobis-distance outlier measure used to scale risk, not a trad…
  • Kritzman Page Turkington 2012 — CFA Institute / FAJ 2012 paper applying Markov-switching models to forecast regimes in turbulence, inflation and growth for dynamic asset…
  • Lalor Swishchuk 2025 — Deep-RL market-making study under semi-Markov and Hawkes dynamics — names the non-Markov problem and the phantom-gain backtest artefacts…
  • Macri et al 2025 — Macri et al. show that embedding latent-Markov-regime probabilities into a DDPG+GRU RL trader improves profit and robustness — but the ev…
  • Macroeconomic Regimes and Regime Shifts — Hamilton’s NBER WP 21863 survey of regime-switching econometrics — formulation, filtered vs smoothed inference, regime-count testing, mul…
  • McLean and Pontiff 2016 — Replicates 97 published return predictors; finds anomaly returns 26% lower out-of-sample and 58% lower post-publication — the headline me…
  • Mettle et al 2024 — Peer-reviewed 2024 paper modelling monthly returns of five national markets as a Markov chain random walk for descriptive risk ranking; n…
  • Millea 2021 — Critical survey of 152 deep-RL-trading papers; documents methodological fragmentation and a reproducibility deficit that block aggregatin…
  • Moody and Saffell 2001 — Foundational 2001 IEEE paper introducing recurrent reinforcement learning (RRL) and the Differential Sharpe Ratio for trading; encouragin…
  • Moody Wu Liao Saffell 1998 — Earliest full recurrent reinforcement learning trading paper; introduced the Differential Sharpe Ratio and the S&P 500 1970-1994 out-of-s…
  • Mulliner et al. 2025 — Man Group / Harvey ‘Regimes’ paper: a non-parametric similarity-based regime model times six equity factors; reports a 0.82-Sharpe anti-r…
  • Nasir et al 2021 — Builds a model-based MDP for American-option trading, solved by value iteration on Microsoft and Coca-Cola options data.
  • Nystrup Kolm Lindström 2021 — Introduces the sparse statistical jump model — joint feature selection plus jump-penalised clustering — robust to noise; a methodology pa…
  • Nystrup Lindström Madsen 2020 — Originating paper for the statistical jump model — a jump-penalised HMM estimator; shows better persistence estimates cut trading costs,…
  • Oliveira et al. 2025 — Oliveira et al. classify regimes by modified k-means over FRED-MD macro data and report regime-conditioned ETF portfolios beating SPY — b…
  • Patton and Weller 2017 — Measures the gap between on-paper factor returns and what mutual funds deliver; implementation costs 2.2-8.5%/yr and momentum net return…
  • Paykan et al. 2025 — DRL (SAC/DDPG+LSTM) crypto portfolio agent beats a Markowitz benchmark but with a -40.9% drawdown and no hard buy-and-hold benchmark — a…
  • Pedersen 2023 — Compares analytical, numerical, dynamic-programming and RL solutions to the Almgren-Chriss optimal-execution MDP.
  • Probability of Backtest Overfitting — Introduces PBO and the CSCV method that estimates the probability a backtest is overfit from only a matrix of returns.
  • Pseudo-Mathematics and Financial Charlatanism — Foundational proof that backtest overfitting is near-unavoidable and, under memory effects, produces negative out-of-sample returns.
  • Quandt 1958 1972 — Quandt’s switching-regression papers (1958, 1972) — the earliest two-regime regression models that pre-date and feed into the Markov regi…
  • Shu and Mulvey 2024 Dynamic Factor Allocation — Shu & Mulvey use a sparse jump model + Black-Litterman to dynamically allocate across six factor ETFs; OOS 2007-2024 with costs, IR rises…
  • Shu Yu and Mulvey 2024 — Shu, Yu & Mulvey benchmark HMM vs statistical jump model regime signals on equity indices 1990-2023 with costs; the jump model wins on tu…
  • Shu Yu and Mulvey 2024 Dynamic Allocation — Shu, Yu & Mulvey’s 12-asset regime-aware allocation paper; jump-model + XGBoost regime forecasts lift OOS Sharpe and halve drawdown after…
  • Stop-Loss Regime Switching 2018 — Elsevier paper deriving closed-form returns of stop-loss/regime-exit rules; tight rules underperform buy-and-hold on US stocks due to exc…
  • Sun Wang An 2021 — Comprehensive survey of 100+ RL-for-quantitative-trading papers; builds a four-task taxonomy and states the field’s unsolved open challen…
  • The 10 Reasons Most Machine Learning Funds Fail — López de Prado’s practitioner catalogue of ten recurring mistakes — research-through-backtesting, walk-forward overfitting, undisclosed t…
  • Wang Lin Mikhelson 2020 — Wang, Lin & Mikhelson (2020): a 3-state HMM rotates style-factor models; reports a 2.0 Sharpe out-of-sample, but with no costs, leverage,…
  • Wilinski 2019 — Expert Systems with Applications 2019 paper: heterogeneous rolling-window Markov chain; peer-reviewed profit claim but ML-tuned, no discl…
  • Zakamulin 2016 — Zakamulin shows a celebrated moving-average timing strategy’s alpha is pure look-ahead bias; corrected, it is no better than buy-and-hold.
  • Zhang Zohren Roberts 2019 — Oxford-Man Institute study training DQN/PG/A2C agents on 50 futures contracts; reports out-of-sample risk-adjusted outperformance over mo…

Risks and Failure Modes (researched — round 2)

  • Data-Snooping Bias — Inflated apparent performance from testing many models/strategies on the same data and reporting only the winners; the multiple-testing f…
  • Overfitting in Quantitative Trading — Fitting a model to noise in historical data so a backtest looks profitable but the edge fails out-of-sample; under serial dependence it p…
  • Transaction Costs and Slippage — Commissions, spreads, market impact and slippage that erode or erase the gross profits of Markov trading models, especially high-turnover…

Risks and Failure Modes (researched — rounds 6–7)

  • Adverse Selection — The market maker’s risk of being systematically filled by better-informed counterparties — the core cost of liquidity provision and a rou…
  • Phantom Gains in Backtests — Inflated backtest P&L produced when a market-making simulator omits adverse fills, queue position, or bid/ask (not mid-price) execution.
  • Sim-to-Real Gap — The discrepancy between an agent’s performance in a historical-data simulation and in a live market — the reinforcement-learning-specific…

Risks and Failure Modes (researched — rounds 9–10)

  • First-Order Memory Assumption — The Markov property assumption that the next price state depends only on the current state, discarding trend and longer-range dependence.
  • HMM Parameter Instability — HMM parameter estimates are noisy and unstable on short or imbalanced samples, especially data-starved bear regimes, degrading regime inf…
  • Non-Stationary Transition Matrix — The core failure mode of Markov-chain trading: estimated transition probabilities drift over time, so a fixed matrix misforecasts the nex…
  • Reward Design Sensitivity — RL trading results depend heavily on the choice of reward function and hyperparameters, making reported edges fragile and easy to overfit.
  • Reward Specification Error — A mis-specified MDP reward is silently optimised, producing policies that maximise the proxy rather than genuine risk-adjusted profit.
  • State Definition Arbitrariness — Markov-chain results depend heavily on the arbitrary choice of how many price states to use and where to set the bucket thresholds.
  • State-Count Selection — Choosing how many hidden regimes the HMM should have is unstable; statistical criteria conflict with trading utility and more states rare…

Strategies (researched — rounds 3–6)

  • 0-1 Strategy — Binary regime-timing rule: hold 100% of a risky asset in the forecast bull/low-volatility regime, 100% cash otherwise.
  • Market Making — Continuously quoting bid and ask prices to earn the spread while managing inventory and adverse-selection risk — a canonical sequential d…
  • Optimal Execution — Trading a fixed, already-decided position over a horizon to minimise market-impact cost while controlling timing risk — the MDP applicati…
  • Recurrent Reinforcement Learning Trading — Direct-reinforcement trading: a recurrent policy outputs positions and is trained by gradient ascent to maximise a risk-adjusted performa…
  • Regime-Based Asset Allocation — Tactical strategy that shifts between risky assets and cash/bonds on a detected market regime — the main real trading use of regime-switc…
  • Time Series Momentum — Trend-following strategy taking the sign of an asset’s own past return as its position; the standard baseline RL trading papers benchmark…
  • VWAP and TWAP Execution — Workhorse execution algorithms — trading in proportion to volume (VWAP) or at a constant rate over time (TWAP) — that are limiting cases…

Strategies (researched — round 9)

  • Style Factor Rotation — Switching a portfolio’s exposure among style-factor models (value, momentum, quality, etc.) according to a detected market regime.

Open Questions