Macri et al 2025
A 2025 working paper (arXiv:2511.00190) studying an optimal trading problem in which the tradeable signal follows an Ornstein-Uhlenbeck process whose mean-reversion speed, level and volatility switch according to a hidden Markov regime process. The authors compare three Deep Deterministic Policy Gradient agents, each augmented with a Gated Recurrent Unit network, that differ only in what they are told about the latent regime: a one-step agent encoding raw GRU hidden states (hid-DDPG), a two-step agent fed posterior regime-probability estimates (prob-DDPG), and a two-step agent fed next-step signal forecasts (reg-DDPG). Transaction costs are charged per unit volume traded.
The paper appears in this vault as steelman evidence for the reinforcement-learning model class. Its central finding is favourable to regime models: the prob-DDPG agent — the one given explicit posterior probabilities over the latent Markov regimes — achieves the highest cumulative reward and the most interpretable strategy, while reg-DDPG (next-step forecasts) “provides limited benefits” and hid-DDPG sits in between. The authors conclude that “embedding probabilistic insights into latent regimes substantially improves both profitability and robustness of reinforcement learning-based trading strategies.” This is a clean mechanism result: structured regime information, not just more data, is what helps the RL trader.
The reason it grades weak rather than moderate is the nature of the evidence. The bulk of the study is simulation — episodes generated from the assumed regime-switching OU model, where the agent is being tested on data drawn from exactly the process it is designed for, so a regime-aware agent winning is close to tautological. The only real-market test is a single brief equity pair-trading demonstration on two co-integrated NASDAQ instruments (Intel and a semiconductor ETF) over roughly one week in late August-September 2025. There is no multi-year out-of-sample track record, no benchmark-beating claim with drawdown metrics, no annualised costed return, and no independent replication. It substantiates the idea that latent-regime structure improves RL trading; it does not substantiate a tradeable, profitable RL strategy.
Macri et al 2025 [supports] Reinforcement Learning Trading Policy Macri et al 2025 [tests_strategy] Recurrent Reinforcement Learning Trading Macri et al 2025 [supports] Steelman — Best Case for Markov Trading Models Macri et al 2025 [relates] Regime Classification
Connections
- Reinforcement Learning Trading Policy — proposes_model, DDPG+GRU agents for trading under partial information, source: https://arxiv.org/html/2511.00190v1
- Recurrent Reinforcement Learning Trading — tests_strategy, recurrent (GRU) RL trading agent, source: https://arxiv.org/html/2511.00190v1
- Steelman — Best Case for Markov Trading Models — supports, mechanism evidence that regime structure helps RL traders, source: https://arxiv.org/abs/2511.00190
- Regime Classification — detects_regime, posterior regime probabilities are the helpful input, source: https://arxiv.org/html/2511.00190v1
- Markov Regime-Switching Model — relates, signal parameters switch on a hidden Markov regime process, source: https://arxiv.org/html/2511.00190v1
- Transaction Costs and Slippage — includes_costs, per-unit-volume transaction cost in the reward, source: https://arxiv.org/html/2511.00190v1
- Sim-to-Real Gap — relates, evidence is mostly simulation with a thin real-data demo, source: https://arxiv.org/abs/2511.00190