Dynamic Programming

Dynamic programming (DP) is the method, introduced by Richard Bellman in 1953, for solving sequential multi-stage decision problems by decomposing them into nested sub-problems and recursing on the value function. Applied to a Markov Decision Process, DP is the classical, model-based solution route: when the transition kernel and reward function are fully specified, DP — via Value Iteration or policy iteration applied to the Bellman Equation — computes the optimal policy exactly. It appears in this vault as the contrast point to reinforcement learning: Pedersen 2023 and Nasir et al 2021 both solve their trading MDPs by DP, while Reinforcement Learning Trading Policy is the alternative used when the model is unknown.

DP’s defining limitation is the Curse of Dimensionality — also named by Bellman — because exact DP requires sweeping the entire state space, it becomes computationally infeasible as the number of state variables grows. Pedersen 2023 states this directly: the DP solution of the Almgren-Chriss execution model “is infeasible for large portfolios.” DP guarantees optimality only with respect to the stated model; it confers no profitability and no protection against a mis-specified transition kernel.

Connections