A multicriteria competitive Markov decision process.
Antonio M. Rodríguez-ChíaJusto PuertoFrancisco R. FernándezPublished in: Math. Methods Oper. Res. (2002)
Keyphrases
- markov decision process
- state space
- optimal policy
- markov decision processes
- reinforcement learning
- infinite horizon
- decision problems
- finite horizon
- temporal difference learning
- partial observability
- initial state
- transition matrices
- policy iteration
- transition probabilities
- reward function
- dynamic programming
- markov chain
- np hard