Richard Bellman

Richard Ernest Bellman (August 26, 1920 – March 19, 1984) was an American applied mathematician. Born in New York City, he studied mathematics at Brooklyn College (BA, 1941) and earned his PhD at Princeton (1946). He spent the formative years of his career at the RAND Corporation from 1949, where U.S. Department of Defense problems in multi-stage logistics and control motivated the work he is now best known for, before moving to the University of Southern California. He appears in this vault because the machinery he created is the solution layer beneath every Markov Decision Process Trading Model.

Bellman introduced Dynamic Programming in 1953 and gave it its first comprehensive treatment in the 1957 monograph Dynamic Programming (Princeton University Press). Its conceptual core is the Principle of Optimality: an optimal policy has the property that, whatever the initial state and decision, the remaining decisions must form an optimal policy with respect to the resulting state. That principle yields a recursive relation — the Bellman Equation — expressing the optimal value of a state as the immediate reward of the best action plus the discounted optimal value of the successor state. In the MDP setting this is the Bellman optimality equation, solved by Value Iteration or policy iteration; in continuous time the corresponding object is the Hamilton-Jacobi-Bellman equation.

Richard Bellman defines Dynamic Programming Richard Bellman defines Bellman Equation Dynamic Programming supports Markov Decision Process Trading Model

Bellman also coined the phrase “curse of dimensionality” in the same 1957 book, to describe the exponential blow-up of a state space as variables are added. This is not a footnote: it is the binding limitation on the classical model-based MDP route. As Pedersen 2023 reports, exact dynamic programming “is infeasible for large portfolios,” which is precisely why Reinforcement Learning Trading Policy methods and function approximation are used to obtain near-optimal policies instead of exact ones. Bellman’s contribution is therefore double-edged for this vault: he supplies both the optimality machinery that makes an MDP solvable and the named reason it does not scale.

Curse of Dimensionality opposes Dynamic Programming Bellman Equation part-of Markov Decision Process Trading Model

One caution worth stating: dynamic programming guarantees an optimal policy for the stated model. It says nothing about whether the model is correct, nor whether the resulting policy is profitable in a live market. Bellman’s equation is a mathematical guarantee of optimality conditional on the transition kernel and reward — it is not a guarantee of tradeable edge, which is exactly the distinction this vault draws between an MDP formulation and a profitable strategy.

Connections

Sources