Steelman — Best Case for Markov Trading Models

This note exists to keep the vault honest. The vault’s running conclusion is skeptical — that Markov, hidden-Markov, regime-switching and reinforcement-learning models earn their keep as risk filters and allocation aids rather than as standalone alpha engines, and that no strong-grade tradeable result exists. A skeptical conclusion is only credible if it has fairly weighed the best counter-evidence. So this note deliberately assembles the strongest possible case FOR Markov-model trading profitability — the most carefully constructed pro-regime studies, the most favourable practitioner data, and any hint of a live or replicated positive result — and then grades that best case by the same rubric applied everywhere else in the vault. The finding, stated plainly up front: even the strongest evidence found grades moderate. It is real, it is costed, it is out-of-sample — but it is modest in magnitude, un-replicated by an independent group, and entirely back-tested. The steelman improves the vault’s confidence that regime classification adds measurable value; it does not produce the strong-grade, live-confirmed standalone-alpha result that would overturn the central conclusion.

The single best piece of evidence is Shu and Mulvey 2024 Dynamic Factor Allocation (arXiv:2410.14841). It is the cleanest pro-regime study in the vault on every axis the rubric cares about. A sparse statistical jump model classifies bull/bear regimes for six US equity style factors; those inferences feed a Black-Litterman model that dynamically allocates a long-only, fully-invested portfolio across seven investable smart-beta ETFs (VLUE, SIZE, MTUM, QUAL, USMV, IWF plus a market index) — a deliberately implementable universe, not the un-investable academic long-short portfolios. The test is genuinely out-of-sample (2007-2024, expanding window), charges 5bp two-way transaction costs, imposes a one-day delay between regime inference and rebalancing, and benchmarks against an equal-weight portfolio. The result: the information ratio rises from 0.05 to roughly 0.44 versus the market, and ~0.4-0.5 versus the equal-weight benchmark; maximum drawdown relative to the market falls from -10.3% to as low as -5.9%; the absolute Sharpe ratio improves; every one of the six single-factor evaluation strategies earns a positive Sharpe (0.16-0.39). That is profitable, costed, validated, benchmarked, drawdown-measured behaviour on a realistic universe — the profile of a credible finding.

The supporting tier is consistent with it. Shu Yu and Mulvey 2024 (Journal of Asset Management, 2024) is the sibling paper: a 0/1 equity-timing strategy where a jump-model signal beats an HMM-guided one and a buy-and-hold benchmark on the S&P 500, DAX and Nikkei 225 after 10bp costs, primarily by cutting turnover and drawdown — also graded moderate. Macri et al 2025 shows that feeding an RL trading agent posterior probabilities of the latent Markov regime “substantially improves both profitability and robustness” — a clean mechanism result. On the practitioner side, Preqin’s database of 152 AI/ML hedge funds showed a +3-percentage-point three-year cumulative-return edge over the all-strategies hedge-fund benchmark with a higher Sharpe (1.96 vs 1.40) — the most favourable aggregate live datapoint, and one that points the opposite way from the Eurekahedge AI index. Honesty requires noting which candidate did not survive into this tier: Wang Lin Mikhelson 2020’s “Regime-Switching Factor Investing with Hidden Markov Models” reports an HMM factor strategy with a 245% cumulative gain and higher Sharpe and Treynor ratios than static factors over 2007-2017, and is superficially a striking pro-regime result — but the vault grades it weak because it omits transaction costs entirely, uses leverage, and draws its edge from a short crash-dominated window; it is included here as the boundary of the steelman, an example of how cost-free leveraged backtesting inflates apparent regime alpha. The genuine supporting tier — Shu Yu & Mulvey 2024 and Macri et al. — is a coherent body of evidence that regime models can add value after costs.

Now the honest grading, applying the vault’s rubric without flattering the case. The best result, Shu & Mulvey 2024 Dynamic Factor Allocation, falls short of strong on three counts. Magnitude: the active return over the equal-weight benchmark is only ~0.4-1.9% per annum — the headline is an information-ratio improvement on a thin active-return base, not a large standalone alpha; the paper itself notes the factor edge “nearly disappeared post-2022.” Tuning circularity: the jump-penalty hyperparameters are chosen by cross-validation to maximise the Sharpe of the very evaluation strategy being scored, the same design choice that caps Shu Yu and Mulvey 2024. No independent replication: this paper, Shu Yu & Mulvey 2024, Aydınhan Kolm Mulvey Shu 2024, Cortese Kolm Lindström 2023 and Bosancic Nie Mulvey 2024 all sit inside one Kolm-Lindström-Mulvey-Nystrup research network — no fully independent group has reproduced a costed, out-of-sample regime-allocation result of this kind. The supporting tier is weaker still: Wang Lin Mikhelson 2020 is a single in-sample-leaning study; Macri et al 2025 is overwhelmingly simulation with a one-week real-data demo; the Preqin number is short-window vendor marketing data over a broad “AI/ML” universe that isolates nothing Markov-specific, and Preqin itself concedes “the overall metrics are not significantly different.” And every academic result here is a backtest, so the well-documented Backtest-to-Live Performance Gap applies in full.

The steelman’s verdict, then, is a moderate grade — and that verdict is itself informative. The best honest case for Markov-model trading is not nothing: regime classification, properly used inside a disciplined allocation framework with realistic costs, demonstrably improves risk-adjusted performance out-of-sample. But the best case is also not a strong-grade, live-confirmed, independently-replicated standalone-alpha result — because no such result was found, despite a deliberate search for it. The steelman therefore confirms rather than overturns the vault’s conclusion: the genuine, defensible value of Markov models is regime-aware risk control and allocation, the Live Regime-Model Evidence Gap for standalone profitable systems remains unfilled, and marketing-grade claims of a profitable “Markov trading system” stay unproven. If a future round surfaces an independent replication of the Shu & Mulvey factor-allocation result, or a disclosed live track record, this note should be revisited — that would be the evidence that moves the grade to strong.

Steelman — Best Case for Markov Trading Models [supports] Regime Classification Shu and Mulvey 2024 Dynamic Factor Allocation [supports] Steelman — Best Case for Markov Trading Models Shu Yu and Mulvey 2024 [supports] Steelman — Best Case for Markov Trading Models Macri et al 2025 [supports] Steelman — Best Case for Markov Trading Models Wang Lin Mikhelson 2020 [relates] Steelman — Best Case for Markov Trading Models Backtest-to-Live Performance Gap [contradicts] Steelman — Best Case for Markov Trading Models Live Regime-Model Evidence Gap [contradicts] Steelman — Best Case for Markov Trading Models Steelman — Best Case for Markov Trading Models [relates] Statistical Jump Model

Connections

Sources