Poisoning finite-horizon Markov decision processes at design time.
William N. CaballeroPhillip R. JenkinsAndrew J. KeithPublished in: Comput. Oper. Res. (2021)
Keyphrases
- markov decision processes
- finite horizon
- optimal policy
- infinite horizon
- optimal stopping
- state space
- finite state
- reinforcement learning
- dynamic programming
- markov decision process
- multistage
- average cost
- policy iteration
- transition matrices
- average reward
- partially observable
- action space
- expected reward
- non stationary