Value Iteration

Value iteration is the Dynamic Programming algorithm that solves a Markov Decision Process by repeatedly applying the Bellman optimality operator — see Bellman Equation — to the value function until it converges, after which the optimal policy is read off greedily. It is the standard exact solver for a fully specified MDP and is guaranteed to converge for finite, model-known problems. It appears in this vault as the concrete method used in Nasir et al 2021, which estimates an option-trading transition kernel from statistical data and then solves the resulting MDP by value iteration; Pedersen 2023 uses the same dynamic-programming family to solve the Almgren-Chriss execution model.

Value iteration’s reach is bounded by the Curse of Dimensionality — it sweeps the entire (discretised) state space each iteration, so cost grows combinatorially with the number of state variables, which is why large MDPs are handed to Reinforcement Learning Trading Policy methods instead. Like all dynamic programming, value iteration returns the optimal policy for the model it is given; an estimation error in the transition kernel is silently propagated into the policy, so a value-iteration solution is no guarantee of live profitability.

Connections

Dynamic Programming — part-of, source: https://en.wikipedia.org/wiki/Richard_E._Bellman
Bellman Equation — relates, source: https://en.wikipedia.org/wiki/Richard_E._Bellman
Markov Decision Process Trading Model — relates, source: https://arxiv.org/abs/2112.04553
Nasir et al 2021 — optimises_policy, source: https://ideas.repec.org/a/kap/compec/v58y2021i2d10.1007_s10614-020-10030-4.html
Pedersen 2023 — relates, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553
Curse of Dimensionality — suffers_overfitting_risk, source: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4508553

SignalTrace

Explorer

Value Iteration

Value Iteration

Connections

Graph View

Table of Contents

Backlinks