Value Iteration

Value iteration is the Dynamic Programming algorithm that solves a Markov Decision Process by repeatedly applying the Bellman optimality operator — see Bellman Equation — to the value function until it converges, after which the optimal policy is read off greedily. It is the standard exact solver for a fully specified MDP and is guaranteed to converge for finite, model-known problems. It appears in this vault as the concrete method used in Nasir et al 2021, which estimates an option-trading transition kernel from statistical data and then solves the resulting MDP by value iteration; Pedersen 2023 uses the same dynamic-programming family to solve the Almgren-Chriss execution model.

Value iteration’s reach is bounded by the Curse of Dimensionality — it sweeps the entire (discretised) state space each iteration, so cost grows combinatorially with the number of state variables, which is why large MDPs are handed to Reinforcement Learning Trading Policy methods instead. Like all dynamic programming, value iteration returns the optimal policy for the model it is given; an estimation error in the transition kernel is silently propagated into the policy, so a value-iteration solution is no guarantee of live profitability.

Connections