Paykan et al. 2025
“Reinforcement Learning-Based Cryptocurrency Portfolio Management Using Soft Actor-Critic and Deep Deterministic Policy Gradient Algorithms” (Kamal Paykan, arXiv:2511.20678, 2025) frames a four-coin crypto portfolio (BTC, ETH, LTC, DOGE) as a Markov decision process under partial observability and trains SAC and DDPG agents, augmented with LSTM networks, on daily OHLCV data with a transaction-cost-charging reward. It reports that the RL agents “consistently outperform” a classical Markowitz mean-variance benchmark, with SAC best (final portfolio value 2.76, Sharpe 0.067). It appears in this vault as a representative positive crypto-RL backtest that nonetheless illustrates the inflation problem: the only benchmark is the weak Markowitz baseline, no passive buy-and-hold Bitcoin benchmark is used (Bitcoin “nearly tripled” over the same window), even the winning SAC agent carries a -40.9% maximum drawdown, DDPG produced negative mean returns, and there is no robustness or replication evidence. Grade is weak: a favourable headline against a soft benchmark adds to the count of optimistic crypto-RL papers without demonstrating tradeable alpha.
Connections
- Cryptocurrency Market — trades_market, four-coin portfolio BTC/ETH/LTC/DOGE, source: https://arxiv.org/html/2511.20678v1
- Reinforcement Learning Trading Policy — reports_profitability, SAC/DDPG agents beat Markowitz on risk-adjusted metrics, source: https://arxiv.org/html/2511.20678v1
- Markov Decision Process Trading Model — proposes_model, portfolio management framed as an MDP, source: https://arxiv.org/html/2511.20678v1
- Buy-and-Hold Benchmark — contradicts, no buy-and-hold Bitcoin benchmark used despite Bitcoin nearly tripling, source: https://arxiv.org/html/2511.20678v1
- Differential Sharpe Ratio — relates, a differential Sharpe ratio reward variant is tested, source: https://arxiv.org/html/2511.20678v1
- Transaction Costs and Slippage — includes_costs, reward charges a transaction-cost rate but slippage not modelled, source: https://arxiv.org/html/2511.20678v1
Sources
- Paykan, K. (2025). “Reinforcement Learning-Based Cryptocurrency Portfolio Management Using Soft Actor-Critic and Deep Deterministic Policy Gradient Algorithms.” arXiv:2511.20678 — https://arxiv.org/html/2511.20678v1