Paykan et al. 2025

“Reinforcement Learning-Based Cryptocurrency Portfolio Management Using Soft Actor-Critic and Deep Deterministic Policy Gradient Algorithms” (Kamal Paykan, arXiv:2511.20678, 2025) frames a four-coin crypto portfolio (BTC, ETH, LTC, DOGE) as a Markov decision process under partial observability and trains SAC and DDPG agents, augmented with LSTM networks, on daily OHLCV data with a transaction-cost-charging reward. It reports that the RL agents “consistently outperform” a classical Markowitz mean-variance benchmark, with SAC best (final portfolio value 2.76, Sharpe 0.067). It appears in this vault as a representative positive crypto-RL backtest that nonetheless illustrates the inflation problem: the only benchmark is the weak Markowitz baseline, no passive buy-and-hold Bitcoin benchmark is used (Bitcoin “nearly tripled” over the same window), even the winning SAC agent carries a -40.9% maximum drawdown, DDPG produced negative mean returns, and there is no robustness or replication evidence. Grade is weak: a favourable headline against a soft benchmark adds to the count of optimistic crypto-RL papers without demonstrating tradeable alpha.

Connections

Sources

  • Paykan, K. (2025). “Reinforcement Learning-Based Cryptocurrency Portfolio Management Using Soft Actor-Critic and Deep Deterministic Policy Gradient Algorithms.” arXiv:2511.20678 — https://arxiv.org/html/2511.20678v1