Nasir et al 2021

“A Markov Decision Process Model for Optimal Trade of Options Using Statistical Data” is a peer-reviewed paper by Ali Nasir, Ambreen Khursheed, Kazim Ali and Faisal Mustafa (all University of Central Punjab), published in Computational Economics (Springer / Society for Computational Economics) 58(2):327-346, August 2021, DOI 10.1007/s10614-020-10030-4. It formulates the trading of American options as a Markov Decision Process and solves it by Value Iteration. It appears in this vault as a textbook example of the classical, model-based MDP route — build the model first, then solve it — and as a case study whose profitability evidence is weak.

Nasir et al 2021 proposes_model Markov Decision Process Trading Model Nasir et al 2021 tests_strategy Options Market

The construction is a clean illustration of the model-based path. The decision problem — when to buy, hold or sell an American option — is sequential: an action today changes the position and the future opportunity set. The authors formalise it as an MDP whose states are characterised by option price levels and the values of the factors (features) influencing those prices; whose actions are the buy/sell/hold decisions; and whose transition probabilities are conditional probabilities of option prices given the factor values, estimated directly from statistical data. The reward is the realised return, and the objective is to maximise accumulated return. Estimating the transition kernel up front is the key step that makes the MDP fully specified.

Dynamic Programming supports Markov Decision Process Trading Model Value Iteration part-of Dynamic Programming

Because the transition kernel is estimated in advance, the MDP is fully specified and can be solved exactly by value iteration — the Dynamic Programming algorithm that iterates the Bellman optimality operator to convergence. This is the defining contrast with the reinforcement-learning siblings: here the model is built and then solved; RL would instead learn the policy from interaction without first estimating the transition probabilities. The case study applies the method to Microsoft and Coca-Cola options and shows how the resulting optimal policy “can be interpreted and used for making sales or purchase decisions,” with the authors claiming “significant advantages for the financial community … including investors.”

Nasir et al 2021 reports_profitability Markov Decision Process Trading Model

On profitability the grade is weak, and the gap between the paper’s claim and its evidence should be stated plainly. This is an in-sample, model-conditional case study on exactly two option chains. The paper reports no out-of-sample test, no train/test split, no transaction-cost or slippage adjustment, no benchmark, and no quantified return, Sharpe or drawdown — the “significant advantages” claim is qualitative and demonstrative. Value iteration provably finds the optimal policy for the estimated model; it gives no guarantee that the estimated conditional-probability transition kernel matches future option dynamics, and estimation error in that kernel compounds through the Bellman recursion. The result is best read as a worked methodological demonstration that the MDP-plus-value-iteration machinery applies to option trading — not as evidence of a tradeable, repeatable edge. It sits firmly in the vault’s pattern: the MDP is a sound formulation, but a formulation confers no profitability on its own.

Overfitting in Quantitative Trading opposes Markov Decision Process Trading Model Out-of-Sample Backtesting opposes Nasir et al 2021

Connections

Sources