Open Questions — Markov Trading Model Profitability

The standing research backlog. Questions move to Answered as P5 rounds close them; new questions are appended as research surfaces them.

Core questions (from the research brief)

Is the popular claim — “a Markov trading model predicts future price action or regime from the current state alone, ignoring older data” — technically accurate for (a) simple Markov chains, (b) HMMs, (c) Markov regime-switching models, (d) MDPs? (partly addressed in seed round — each model note states its memory assumption; needs a consolidated answer)
Which Markov-based approaches have actually been tested in trading, and on which markets/instruments?
Do they produce positive returns after realistic transaction costs and slippage?
Are returns robust out-of-sample, across market regimes, and across instruments?
Are Markov models better used for regime detection, signal filtering, risk control, or direct trade generation?
What evidence exists from peer-reviewed papers, working papers, reproducible code, or live/production use (as opposed to backtests)?
Where do Markov models fail: non-stationarity, overfitting, state-definition, transition instability, latency, costs, regime misclassification?
What is the fairest overall conclusion: profitable model, useful component, or mostly academic toy?

Live evidence gap. Every seed-round source is a backtest, simulation, or academic study. Is there any credible disclosure of Markov-family models used in live production trading (named funds, broker/exchange material)?
Costs methodology. Transaction Costs and Slippage — what cost/slippage assumptions do the better studies use, and how sensitive are conclusions to them?
Overfitting quantification. Overfitting in Quantitative Trading, Data-Snooping Bias — does the Markov-trading literature apply formal multiple-testing corrections (White’s Reality Check, Hansen SPA, deflated Sharpe ratio)?
Regime vs alpha. Regime Classification — can regime labels be converted into directional alpha that beats a benchmark after costs, or is the only robust use risk control?
Reproducibility. How much of the RL-trading and HMM-trading literature releases code, and do independent replications confirm the headline results?
Statistical jump model. Statistical Jump Model beat the HMM in one 2024 study — is this finding replicated elsewhere?
Baseline comparison. Which studies compare against honest baselines (buy-and-hold, moving-average crossover, ARIMA/GARCH, momentum) rather than weak strawmen?

Do 2024-2025 advances change the verdict? Reviewed in Recent Developments 2024-2025: neural regime models, the FinRL benchmark ecosystem, generative/world-model market simulation, and continued jump-model refinement. Finding — better tooling and more honest evaluation, but no change to the verdict; the recurring weaknesses (non-stationarity, overfitting, costs, no live evidence) all survive.
Synthetic-data validation. Generative world models (VAE/GAN/diffusion) train policies inside a generated market — does validating the synthetic distribution against the live future just re-introduce the non-stationarity problem the technique was meant to escape? Likely yes; flagged as a subtler form of Data-Snooping Bias.
FinRL live trading. FinRL Contests evaluate on withheld historical out-of-sample data, not disclosed live capital. Is there any FinRL-based strategy with audited live results? Not found — the Live Regime-Model Evidence Gap remains open for RL too.

Statistical jump model replication. Partly addressed (round 10): the Statistical Jump Model line (Shu Yu and Mulvey 2024, Aydınhan Kolm Mulvey Shu 2024) now has open-source code (jumpmodels) and continues into 2025 dynamic-factor-allocation work — but it remains the same Princeton/Kolm research network; its consistent, defensible claim stays downside-risk reduction, not benchmark-beating alpha. Still awaits fully independent replication.