Bellman Equation
The Bellman equation is the recursive optimality relation, derived by Richard Bellman, that defines the optimal value of a state in a Markov Decision Process: V*(s) = max_a [ R(s,a) + gamma * sum_s’ T(s,a,s’) V*(s’) ] — the best action’s immediate reward plus the discounted expected value of the successor state. It is the mathematical heart of every Markov Decision Process Trading Model: the state-value and action-value (Q) functions are linked through it, and the optimal policy is the one that satisfies the Bellman optimality equation. In continuous time the corresponding object is the Hamilton-Jacobi-Bellman partial differential equation.
The equation is solved by iterating the Bellman operator to convergence — Value Iteration — or by alternating evaluation and improvement (policy iteration), the Dynamic Programming algorithms. The Hambly Xu Yang 2023 survey introduces the Bellman equation as the link object underneath all value- and policy-based RL methods in finance. It is a guarantee of optimality conditional on the model: it says nothing about whether the transition kernel is correct or whether the optimal policy is profitable in a live market.
Connections
- Markov Decision Process Trading Model — part-of, source: https://arxiv.org/abs/2112.04553
- Richard Bellman — relates, source: https://en.wikipedia.org/wiki/Richard_E._Bellman
- Dynamic Programming — relates, source: https://en.wikipedia.org/wiki/Richard_E._Bellman
- Value Iteration — relates, source: https://en.wikipedia.org/wiki/Richard_E._Bellman
- Hambly Xu Yang 2023 — relates, source: https://arxiv.org/abs/2112.04553