Pedersen 2023
“Revisiting Optimal Execution of Portfolio Transactions: A Dynamic Programming and Reinforcement Learning Approach” is an 88-page University of Copenhagen MSc thesis / SSRN working paper (abstract ID 4508553, DOI 10.2139/ssrn.4508553) by Jens Pedersen, written 30 June 2023 and posted 21 July 2023. It revisits the Almgren Chriss 2000 optimal-execution model and compares four distinct ways of solving the same problem: a closed-form analytical solution, numerical optimisation, Dynamic Programming, and Reinforcement Learning Trading Policy (the DDPG actor-critic algorithm). It is the clearest “methods paper” in this vault — its contribution is a comparison of solution machinery on a known model, not a claim about tradeable profitability.
Pedersen 2023 tests_strategy Optimal Execution Pedersen 2023 relates Markov Decision Process Trading Model
The optimal-execution problem — liquidating or acquiring a fixed quantity over a finite horizon while balancing market-impact cost against the variance of execution cost — is a textbook sequential decision problem and maps cleanly onto an MDP. The state is remaining inventory and time-to-deadline (with price/volatility in richer variants), the action is the quantity traded this period, and the reward is the period’s execution cost plus a risk-aversion-weighted penalty on cost variance — the mean-variance implementation-shortfall criterion of Almgren-Chriss. Because that model is fully specified, the MDP can be solved exactly: analytically for the linear-impact Gaussian case, or by Value Iteration over a discretised state grid.
Almgren Chriss 2000 part-of Optimal Execution Optimal Execution relates Markov Decision Process Trading Model
Pedersen’s headline result is the cleanest empirical statement in this vault of the MDP-formulation-versus-solution-method distinction. The dynamic-programming solution “align[s] with the model intuition” — value iteration recovers the correct policy — but “the methodology is infeasible for large portfolios.” That is a direct, named instance of the Curse of Dimensionality: the discretised state space grows combinatorially, so exact dynamic programming does not scale. Reinforcement learning is positioned as the scalable alternative: DDPG “captures model dynamics and demonstrates advantages at different risk-aversion levels” without exhaustively solving the Bellman recursion. The analytical solution replicates the efficient frontier and implementation shortfall; numerical optimisation is consistent with Almgren-Chriss; risk-loving agents incur higher costs and are excluded from further analysis.
Curse of Dimensionality opposes Dynamic Programming Reinforcement Learning Trading Policy supports Markov Decision Process Trading Model
On profitability the grade is inconclusive — and the paper does not actually claim otherwise. This is a simulation study against a known model (Almgren-Chriss) with stylised and lightly calibrated empirical examples; it is not a live-trading or out-of-sample profit study. Its outputs are execution-cost and implementation-shortfall comparisons, not trading returns or Sharpe ratios. The explicit finding that RL “results differ across volatility and risk-aversion level” is itself a cautionary note: any reported advantage is conditional on regime and preference parameters, not an unconditional edge. Pedersen 2023 belongs in this vault as evidence that (a) the MDP formulation is the right scaffolding for optimal execution, (b) dynamic programming solves it correctly but does not scale, and (c) RL scales but inherits regime-dependence — none of which establishes that an MDP-based execution policy is profitable after costs in a live market.
Pedersen 2023 excludes_costs Out-of-Sample Backtesting Non-Stationarity opposes Markov Decision Process Trading Model
Connections
- Markov Decision Process Trading Model — tests_strategy, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Optimal Execution — tests_strategy, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Almgren Chriss 2000 — relates, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Dynamic Programming — relates, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Value Iteration — relates, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Reinforcement Learning Trading Policy — relates, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Curse of Dimensionality — suffers_overfitting_risk, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- US Equity Market — trades_market, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
- Out-of-Sample Backtesting — lacks_live_evidence, 2023, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553