Bellman Equation

The Bellman equation is the recursive optimality relation, derived by Richard Bellman, that defines the optimal value of a state in a Markov Decision Process: V*(s) = max_a [ R(s,a) + gamma * sum_s’ T(s,a,s’) V*(s’) ] — the best action’s immediate reward plus the discounted expected value of the successor state. It is the mathematical heart of every Markov Decision Process Trading Model: the state-value and action-value (Q) functions are linked through it, and the optimal policy is the one that satisfies the Bellman optimality equation. In continuous time the corresponding object is the Hamilton-Jacobi-Bellman partial differential equation.

The equation is solved by iterating the Bellman operator to convergence — Value Iteration — or by alternating evaluation and improvement (policy iteration), the Dynamic Programming algorithms. The Hambly Xu Yang 2023 survey introduces the Bellman equation as the link object underneath all value- and policy-based RL methods in finance. It is a guarantee of optimality conditional on the model: it says nothing about whether the transition kernel is correct or whether the optimal policy is profitable in a live market.

Connections