Cryptocurrency Market

The cryptocurrency market is the set of venues on which digital assets — chiefly Bitcoin and Ethereum — trade continuously. It is the dominant testbed for Markov-family and reinforcement-learning trading research, and the reasons are structural rather than evidential. Crypto trades 24/7 with no closing auction or overnight gap, so an RL agent can be trained and evaluated on an unbroken time series; it is extremely volatile, which makes any timing signal look consequential in a backtest; it generates abundant high-frequency data (five-minute bars, full limit-order-book and on-chain feeds) that are free or cheap to obtain; and it is retail-accessible through exchange APIs, so a researcher can wire an agent to a live or paper account without institutional infrastructure. A representative recent preprint states the appeal plainly: crypto “present[s] unique opportunities for algorithmic trading due to [its] inherent volatility, 24/7 operation, and rich on-chain data availability.” None of these properties is a claim about profitability — they are reasons crypto is convenient to study.

That convenience is also the vault’s central caution. Because crypto is so easy to backtest, the literature accumulates positive crypto backtests faster than it accumulates credible ones — the popularity of the testbed inflates the count of optimistic results without improving their quality. Crypto’s defining feature for the modeller, extreme volatility paired with a low signal-to-noise ratio, is exactly the condition under which overfitting is most acute: with thousands of five-minute bars and 2,700 hyperparameter combinations to search, a “lucky” agent that fits noise is almost guaranteed to appear (Gort et al. 2022). The market’s youth — most coins have under a decade of history — means small effective samples and structural breaks that violate the stationarity assumptions of Markov-chain, HMM and MDP models alike. And crypto’s transaction costs are materially higher than in mature equity markets: exchange fees of 0.1-0.3% per side, wide spreads on all but the largest coins, and severe market impact during stress — so a backtest computed gross of costs, or net of an optimistic flat fee, overstates net performance more here than almost anywhere else.

Cryptocurrency Market [relates] Reinforcement Learning Trading Policy Overfitting in Quantitative Trading [causes] Backtest-to-Live Performance Gap Transaction Costs and Slippage [opposes] Reinforcement Learning Trading Policy

The vault’s clearest in-market evidence is Gort et al. 2022, a deep-RL crypto study that frames trading as a Markov Decision Process Trading Model and tests PPO, TD3 and SAC agents on ten coins over a May-June 2022 window containing two crashes. Its result is decisive and negative: every method lost money, and the best, least-overfitted agent (the one surviving a combinatorial-cross-validation overfitting test) still lost 34.96% — better than the −47.78% equal-weight portfolio and −50.78% market index, but a one-third capital loss in eight weeks. The paper is genuinely out-of-sample, charges a 0.3%/trade fee and releases replication code, yet it cannot show profit; it explicitly frames the prior crypto-DRL literature’s optimistic backtests as “false positive[s]” from overfitting. This is the crash-regime problem in concrete form: crypto’s drawdowns are not tail curiosities but recurring features, and an agent that looks profitable in a bull sample is routinely destroyed in the next crash. Bandarupalli 2025 reaches a quieter version of the same verdict — a cost-aware, out-of-sample PPO agent on BTC/ETH/SPY achieved a Sharpe of 1.23 against 1.46 for passive buy-and-hold and a final NAV below buy-and-hold, i.e. the RL agent underperformed the trivial benchmark once costs were honestly charged.

Gort et al. 2022 [trades_market] Cryptocurrency Market Gort et al. 2022 [reports_underperformance] Reinforcement Learning Trading Policy Bandarupalli 2025 [reports_underperformance] Reinforcement Learning Trading Policy

Positive crypto backtests do exist, and they illustrate the inflation problem precisely. Paykan et al. 2025 reports that a Soft Actor-Critic portfolio agent “consistently outperform[s]” a Markowitz mean-variance benchmark on a four-coin portfolio — but the benchmark is only Markowitz, a weak baseline; the paper does not benchmark against passive buy-and-hold Bitcoin (which “nearly tripled” over the same window), and even the winning SAC agent carries a −40.9% maximum drawdown while the DDPG variant produced negative mean returns. A favourable-sounding headline against a soft benchmark, with a 40% drawdown and no hard benchmark or replication, adds one more unit to the count of positive crypto-RL papers without adding evidence of tradeable alpha. The honest conclusion for this market: the volume of optimistic crypto backtests reflects how easy crypto is to study, not how profitable trading it is.

Paykan et al. 2025 [reports_profitability] Reinforcement Learning Trading Policy Paykan et al. 2025 [contradicts] Buy-and-Hold Benchmark

The Markov-family is also applied to crypto in a purely regime-classification mode, where the evidence is sounder because no profitability is claimed. Cortese Kolm Lindström 2023 fits a sparse Statistical Jump Model to the largest cryptocurrencies and finds a clean three-state bull/neutral/bear structure driven by return moments, trend/reversal signals, market activity and public attention. Earlier, Chappell 2018 applied Markov regime-switching / hidden Markov models to Bitcoin returns — among the first to do so — and found a restricted five-state model best captured crypto’s volatility clustering, volatility jumps and asymmetric transitions. A wider literature (Koki, Leonardos & Piliouras 2022; Giudici & Abu-Hashish 2020) confirms that HMMs reliably describe crypto’s bull/stable/bear regimes. This regime-classification use is the defensible application: it is real and useful, but it stops short of, and does not substantiate, a tradeable edge — exactly the Regime Classification vs. tradeable-alpha distinction this vault keeps drawing.

Cortese Kolm Lindström 2023 [detects_regime] Cryptocurrency Market Chappell 2018 [detects_regime] Cryptocurrency Market Statistical Jump Model [defines] Regime Classification

Connections

Sources

  • Gort, B. J. D., Liu, X.-Y., Sun, X., Gao, J., Chen, S., & Wang, C. D. (2023). “Deep Reinforcement Learning for Cryptocurrency Trading: Practical Approach to Address Backtest Overfitting.” arXiv:2209.05559 — https://arxiv.org/abs/2209.05559
  • Cortese, F. P., Kolm, P. N., & Lindström, E. (2023). “What drives cryptocurrency returns? A sparse statistical jump model approach.” Digital Finance 5(3), 483-518 — https://doi.org/10.1007/s42521-023-00085-x
  • Bandarupalli, E. (2025). “Risk-Aware Deep Reinforcement Learning for Crypto and Equity Trading Under Transaction Costs.” SSRN Working Paper — https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5662930
  • Paykan, K. (2025). “Reinforcement Learning-Based Cryptocurrency Portfolio Management Using Soft Actor-Critic and Deep Deterministic Policy Gradient Algorithms.” arXiv:2511.20678 — https://arxiv.org/html/2511.20678v1
  • Chappell, D. (2018). “Regime heteroskedasticity in Bitcoin: A comparison of Markov switching models.” MPRA Paper 90682, University Library of Munich — https://ideas.repec.org/p/pra/mprapa/90682.html
  • Zhang, W. (2025). “Neural Network-Based Algorithmic Trading Systems: Multi-Timeframe Analysis and High-Frequency Execution in Cryptocurrency Markets.” arXiv:2508.02356 — https://arxiv.org/html/2508.02356v1