Bandarupalli 2025

“Risk-Aware Deep Reinforcement Learning for Crypto and Equity Trading Under Transaction Costs” (Ekantheswar Bandarupalli, SSRN working paper, 26 October 2025) trains a Proximal Policy Optimization agent to take long/flat/short positions in Bitcoin, Ethereum and SPY, with a reward function that explicitly charges transaction costs and adds a volatility-sensitive risk penalty. Evaluated out-of-sample on 2024 data after training on 2020-2024, the RL policy achieved a Sharpe ratio of 1.23 against 1.46 for passive buy-and-hold, and a final NAV of 1.916 against 2.213 — the RL agent underperformed the trivial benchmark on both risk-adjusted and absolute terms. It appears in this vault as a clean, honestly-costed counterexample to optimistic crypto-RL claims: when costs are charged and a hard buy-and-hold benchmark is used, the learned policy adds nothing. Grade is negative; evidence strength is alleged pending peer review and replication.

Connections

Sources