Steelman — Best Case for Markov Trading Models

This note exists to keep the vault honest. The vault’s running conclusion is skeptical — that Markov, hidden-Markov, regime-switching and reinforcement-learning models earn their keep as risk filters and allocation aids rather than as standalone alpha engines, and that no strong-grade tradeable result exists. A skeptical conclusion is only credible if it has fairly weighed the best counter-evidence. So this note deliberately assembles the strongest possible case FOR Markov-model trading profitability — the most carefully constructed pro-regime studies, the most favourable practitioner data, and any hint of a live or replicated positive result — and then grades that best case by the same rubric applied everywhere else in the vault. The finding, stated plainly up front: even the strongest evidence found grades moderate. It is real, it is costed, it is out-of-sample — but it is modest in magnitude, un-replicated by an independent group, and entirely back-tested. The steelman improves the vault’s confidence that regime classification adds measurable value; it does not produce the strong-grade, live-confirmed standalone-alpha result that would overturn the central conclusion.

The single best piece of evidence is Shu and Mulvey 2024 Dynamic Factor Allocation (arXiv:2410.14841). It is the cleanest pro-regime study in the vault on every axis the rubric cares about. A sparse statistical jump model classifies bull/bear regimes for six US equity style factors; those inferences feed a Black-Litterman model that dynamically allocates a long-only, fully-invested portfolio across seven investable smart-beta ETFs (VLUE, SIZE, MTUM, QUAL, USMV, IWF plus a market index) — a deliberately implementable universe, not the un-investable academic long-short portfolios. The test is genuinely out-of-sample (2007-2024, expanding window), charges 5bp two-way transaction costs, imposes a one-day delay between regime inference and rebalancing, and benchmarks against an equal-weight portfolio. The result: the information ratio rises from 0.05 to roughly 0.44 versus the market, and ~0.4-0.5 versus the equal-weight benchmark; maximum drawdown relative to the market falls from -10.3% to as low as -5.9%; the absolute Sharpe ratio improves; every one of the six single-factor evaluation strategies earns a positive Sharpe (0.16-0.39). That is profitable, costed, validated, benchmarked, drawdown-measured behaviour on a realistic universe — the profile of a credible finding.

The supporting tier is consistent with it. Shu Yu and Mulvey 2024 (Journal of Asset Management, 2024) is the sibling paper: a 0/1 equity-timing strategy where a jump-model signal beats an HMM-guided one and a buy-and-hold benchmark on the S&P 500, DAX and Nikkei 225 after 10bp costs, primarily by cutting turnover and drawdown — also graded moderate. Macri et al 2025 shows that feeding an RL trading agent posterior probabilities of the latent Markov regime “substantially improves both profitability and robustness” — a clean mechanism result. On the practitioner side, Preqin’s database of 152 AI/ML hedge funds showed a +3-percentage-point three-year cumulative-return edge over the all-strategies hedge-fund benchmark with a higher Sharpe (1.96 vs 1.40) — the most favourable aggregate live datapoint, and one that points the opposite way from the Eurekahedge AI index. Honesty requires noting which candidate did not survive into this tier: Wang Lin Mikhelson 2020’s “Regime-Switching Factor Investing with Hidden Markov Models” reports an HMM factor strategy with a 245% cumulative gain and higher Sharpe and Treynor ratios than static factors over 2007-2017, and is superficially a striking pro-regime result — but the vault grades it weak because it omits transaction costs entirely, uses leverage, and draws its edge from a short crash-dominated window; it is included here as the boundary of the steelman, an example of how cost-free leveraged backtesting inflates apparent regime alpha. The genuine supporting tier — Shu Yu & Mulvey 2024 and Macri et al. — is a coherent body of evidence that regime models can add value after costs.

Now the honest grading, applying the vault’s rubric without flattering the case. The best result, Shu & Mulvey 2024 Dynamic Factor Allocation, falls short of strong on three counts. Magnitude: the active return over the equal-weight benchmark is only ~0.4-1.9% per annum — the headline is an information-ratio improvement on a thin active-return base, not a large standalone alpha; the paper itself notes the factor edge “nearly disappeared post-2022.” Tuning circularity: the jump-penalty hyperparameters are chosen by cross-validation to maximise the Sharpe of the very evaluation strategy being scored, the same design choice that caps Shu Yu and Mulvey 2024. No independent replication: this paper, Shu Yu & Mulvey 2024, Aydınhan Kolm Mulvey Shu 2024, Cortese Kolm Lindström 2023 and Bosancic Nie Mulvey 2024 all sit inside one Kolm-Lindström-Mulvey-Nystrup research network — no fully independent group has reproduced a costed, out-of-sample regime-allocation result of this kind. The supporting tier is weaker still: Wang Lin Mikhelson 2020 is a single in-sample-leaning study; Macri et al 2025 is overwhelmingly simulation with a one-week real-data demo; the Preqin number is short-window vendor marketing data over a broad “AI/ML” universe that isolates nothing Markov-specific, and Preqin itself concedes “the overall metrics are not significantly different.” And every academic result here is a backtest, so the well-documented Backtest-to-Live Performance Gap applies in full.

The steelman’s verdict, then, is a moderate grade — and that verdict is itself informative. The best honest case for Markov-model trading is not nothing: regime classification, properly used inside a disciplined allocation framework with realistic costs, demonstrably improves risk-adjusted performance out-of-sample. But the best case is also not a strong-grade, live-confirmed, independently-replicated standalone-alpha result — because no such result was found, despite a deliberate search for it. The steelman therefore confirms rather than overturns the vault’s conclusion: the genuine, defensible value of Markov models is regime-aware risk control and allocation, the Live Regime-Model Evidence Gap for standalone profitable systems remains unfilled, and marketing-grade claims of a profitable “Markov trading system” stay unproven. If a future round surfaces an independent replication of the Shu & Mulvey factor-allocation result, or a disclosed live track record, this note should be revisited — that would be the evidence that moves the grade to strong.

Connections

Shu and Mulvey 2024 Dynamic Factor Allocation — reports_profitability, the single strongest steelman result; OOS 2007-2024, costed, IR 0.05→0.44, source: https://arxiv.org/abs/2410.14841
Shu Yu and Mulvey 2024 — reports_profitability, sibling jump-model timing study, also moderate, source: https://doi.org/10.1057/s41260-024-00376-x
Wang Lin Mikhelson 2020 — suffers_overfitting_risk, examined but rejected from the supporting tier as a cost-free leveraged backtest, source: https://www.mdpi.com/1911-8074/13/12/311
Macri et al 2025 — reports_profitability, regime probabilities improve RL trader profit/robustness (mostly simulation), source: https://arxiv.org/abs/2511.00190
Statistical Jump Model — detects_regime, the model behind the strongest steelman result, source: https://arxiv.org/abs/2410.14841
Regime Classification — relates, the genuine value the steelman confirms is regime-aware risk control, source: https://arxiv.org/abs/2410.14841
Tactical Asset Allocation — relates, regime signals add value inside a disciplined allocation framework, source: https://arxiv.org/html/2410.14841v1
Backtest-to-Live Performance Gap — contradicts, every steelman study is a backtest; the gap caps the grade, source: https://arxiv.org/abs/2410.14841
Live Regime-Model Evidence Gap — contradicts, the steelman found no live or independently replicated result, source: https://www.preqin.com/insights/research/blogs/the-rise-of-the-machines-ai-funds-are-outperforming-the-hedge-fund-benchmark
Out-of-Sample Backtesting — relates, the steelman result clears OOS testing but not the live tier, source: https://arxiv.org/abs/2410.14841

Sources

Shu, Y. & Mulvey, J. M. (2024). “Dynamic Factor Allocation Leveraging Regime-Switching Signals.” arXiv:2410.14841. https://arxiv.org/abs/2410.14841
Shu, Y., Yu, C. & Mulvey, J. M. (2024). “Downside Risk Reduction Using Regime-Switching Signals: A Statistical Jump Model Approach.” Journal of Asset Management. https://doi.org/10.1057/s41260-024-00376-x
Macri, A. et al. (2025). “Deep Reinforcement Learning for Optimal Trading with Partial Information.” arXiv:2511.00190. https://arxiv.org/abs/2511.00190
Wang, M., Lin, Y.-H. & Mikhelson, I. (2020). “Regime-Switching Factor Investing with Hidden Markov Models.” Journal of Risk and Financial Management, 13(12), 311. https://www.mdpi.com/1911-8074/13/12/311
Preqin (2019). “The Rise of the Machines: AI Funds Are Outperforming the Hedge Fund Benchmark.” https://www.preqin.com/insights/research/blogs/the-rise-of-the-machines-ai-funds-are-outperforming-the-hedge-fund-benchmark

SignalTrace

Explorer

Steelman — Best Case for Markov Trading Models

Steelman — Best Case for Markov Trading Models

Connections

Sources

Graph View

Table of Contents

Backlinks