Recent Developments 2024-2025

This note tracks the 2024-2025 wave of research that touches Markov-family trading models — deep-learning regime detection, the maturing FinRL reinforcement-learning ecosystem, generative / “world-model” market simulation, and recent regime-switching / jump-model advances — and asks the question that organises this whole vault: does any of it change the verdict? The short answer is no. The newest work delivers genuinely better tooling and, in places, more honest evaluation, but it repeats the established pattern almost exactly: stronger backtests, the same unsolved structural problems — non-stationarity, overfitting, transaction costs, and a near-total absence of disclosed live evidence. The 2024-2025 literature is best read as converging on the vault’s existing conclusions from new directions, not overturning them.

The clearest visible trend is the fusion of neural networks with regime models. Where the seed-round literature treated the HMM and the Markov Regime-Switching Model as standalone econometric objects, 2024-2025 work routinely wraps them in deep learning: neural-HMM hybrids, transformer/attention-based regime classifiers, and LLM-assisted regime labelling (see Neural Regime Model). The CFA Institute’s 2025 AI in Asset Management chapter on deep learning frames the consensus precisely — DL “does not claim to replace classic quant methods”, rather “hybrids work best: neural surrogates wrapped around established models.” This is the encouraging side of the story: regime detection is being made richer (time-varying transition probabilities conditioned on exogenous variables, attention over multi-scale order-flow features) and the field has openly admitted that early deep-learning trading “ignored market microstructure, costs, and controls.” But the evidence these hybrids produce has not improved in kind. The Monteiro (2024) HMM-plus-neural-network energy strategy is representative: a headline 83% COVID-window return that, read honestly, came with a Sharpe of just 0.77 and a negative information ratio of -0.1 — i.e. it underperformed its own benchmark on a risk-adjusted basis — on a single sector, in a single window, with no cross-regime out-of-sample test. A better model architecture wrapped around a weaker-than-benchmark result.

Recent Developments 2024-2025 [relates] Hidden Markov Model Regime Detection Neural Regime Model [part-of] Recent Developments 2024-2025 Recent Developments 2024-2025 [relates] Regime Classification

The second trend is the maturation of the deep-RL trading ecosystem, embodied by FinRL and its 2023-2025 benchmarking contests. This is the most genuine progress in the period: FinRL gives the field something it badly lacked — standardised MDP task definitions, shared gym-style market environments, GPU-parallel simulation, and withheld out-of-sample evaluation data, directly attacking the reproducibility deficit that Millea 2021 documented. And precisely because the FinRL Contests evaluate honestly, they reproduce the vault’s verdict rather than challenge it. The 2023 contest’s winning agents beat the Dow Jones index on Sharpe and drawdown in the pre-deadline window but underperformed the index out-of-sample in the post-deadline window — two of three winners turned outright negative. The 2025 FinRL-DeepSeek contest produced eye-catching cumulative returns (190-340%) paired with maximum drawdowns as deep as -92%. The FinRL papers themselves name “policy instability” — extreme sensitivity to hyperparameters, random seeds and market noise — as a core, unsolved problem. Better infrastructure has made the failure modes more visible, not made them go away. The convergent message of the period’s two large RL surveys — the 167-paper Evolution of RL in Quantitative Finance (2024) and the 167-article RL in Financial Decision Making systematic review (2025) — is identical: RL “shows promise” but the Sim-to-Real Gap, Non-Stationarity, sample inefficiency, interpretability and overfitting remain unsolved.

Recent Developments 2024-2025 [relates] Reinforcement Learning Trading Policy FinRL Framework and Contests [part-of] Recent Developments 2024-2025 FinRL Framework and Contests [supports] Replication Crisis in Quantitative Finance

The third trend is generative and “world-model” approaches — VAEs, GANs and diffusion models used to synthesise market data and simulate market dynamics, surveyed in the 2025 From Deep Learning to LLMs paper. The motivation is sound: real financial data is scarce, non-stationary and low-signal, so synthetic paths are attractive for training, robustness testing and stress scenarios; the CFA chapter endorses VAE/GAN synthetic data for exactly this. But this carries a sharp and under-acknowledged risk for a profitability question. A model that learns its policy inside a generated world inherits whatever biases the generator encodes — and validating that the synthetic distribution matches the live future is the very non-stationarity problem the technique was meant to escape. Generative world models are a promising research direction for risk testing; treating a backtest against synthetic data as evidence of tradeable alpha would be a new and subtler form of data snooping.

Recent Developments 2024-2025 [relates] Non-Stationarity Non-Stationarity [contradicts] Recent Developments 2024-2025

The fourth trend is incremental regime-switching / jump-model refinement, the steadiest and most credibly evidenced strand. The Statistical Jump Model line of work — Shu Yu and Mulvey 2024, the continuous jump model of Aydınhan Kolm Mulvey Shu 2024, and the 2025 dynamic-factor-allocation extensions — continues to mature, now with open-source code (the jumpmodels Python package), realistic transaction costs, trading delays and time-series cross-validation. Notably, these papers stay disciplined about their own claims: their consistent, defensible finding is downside-risk reduction — lower volatility and shallower drawdowns — not benchmark-beating directional alpha. This is exactly the asymmetry established in Regime Classification: regime models are a confirmed risk-filter layer and an inconclusive-to-weak alpha source. The 2024-2025 jump-model work is genuine, careful progress — but it is progress within the vault’s existing verdict, not against it.

Aydınhan Kolm Mulvey Shu 2024 [supports] Statistical Jump Model Statistical Jump Model [part-of] Recent Developments 2024-2025 Recent Developments 2024-2025 [supports] Live Regime-Model Evidence Gap

The honest synthesis for 2026: the 2024-2025 period delivered real progress in tooling and methodological honesty — reproducible benchmarks, neural-augmented regime models, generative stress-testing data, cost-aware deep-hedging — and a notable cultural shift toward admitting that costs, microstructure and governance are non-negotiable. But on the vault’s actual research question it changed nothing. Every recurring weakness from the seed rounds survives intact: non-stationarity still breaks the fixed-MDP assumption; overfitting and seed-sensitivity still inflate backtests; transaction costs still erode gross edges; and the live-evidence gap is unclosed — even the most infrastructure-rich 2025 work (FinRL Contests) evaluates on withheld historical data, not disclosed live capital. Better backtests, same unsolved problems.

Connections

Sources