Sim-to-Real Gap

The sim-to-real gap is the discrepancy between a trading agent’s performance in a historical-data simulation and its performance in a live market. Backtests typically assume the agent’s own orders do not move prices (no market impact), that orders fill instantly at the close or mid-price, and that costs are a simple fixed rate — assumptions that break in live trading, especially at institutional volume. Sun Wang An 2021 notes that learning by directly interacting with the real market is risky and impractical, so reinforcement-learning trading is trained almost entirely offline on historical data; this is why the literature is dominated by backtests and has almost no disclosed live-trading evidence.

The gap is the reinforcement-learning-specific instance of the broader Backtest-to-Live Performance Gap. The danger is not merely that reported returns shrink — it is that the agent optimises against the simulator itself. If the environment misprices liquidity, the policy that maximises simulated reward is one that exploits that mispricing, so the cost model is part of what is learned, not a footnote applied afterwards. Abbade and Reali Costa 2026 demonstrate this directly: replacing a flat 10 bp fee with an Almgren–Chriss market-impact model in open-source RL backtest environments collapsed one agent’s turnover from 19% to 1% and another’s out-of-sample return from 34% to 25% — different cost assumptions produce different policies, not just different P&L. An agent that learned to trade frequently because the simulator made trading nearly free has, by construction, no live edge.

Aggregate live evidence corroborates the gap. Buczynski, Cuzzolin and Sahakian’s review of 27 ML market-forecasting experiments found a literature claiming >90% accuracy but “conspicuously lacking in high-profile success cases”, while prominent ML-driven funds (Aidyia, Sentient Technologies) liquidated within months of launch and the Eurekahedge AI Hedge Fund Index underperformed the S&P 500 over fifteen years of real-money trading. Marcos López de Prado’s The 10 Reasons Most Machine Learning Funds Fail gives the practitioner mechanism: research-through-backtesting and single-path walk-forward testing manufacture simulated edges that do not survive deployment.

For this vault the sim-to-real gap is why a Markov-decision-process or reinforcement-learning agent with a strong simulated Sharpe ratio cannot be graded above weak-to-moderate without realistic costs and — critically — Live Trading Evidence or independent replication. No source reviewed provides a disclosed live track record for a standalone Markov/MDP/RL trading agent, so the gap remains uncrossed for every such claim in the vault.

Connections

Sim-to-Real Gap [part-of] Backtest-to-Live Performance Gap Transaction Costs and Slippage [causes] Sim-to-Real Gap Abbade and Reali Costa 2026 [supports] Sim-to-Real Gap Sim-to-Real Gap [contradicts] Live Trading Evidence

Sources