On learning history based policies for controlling Markov decision processes.
Gandharv PatilAditya MahajanDoina PrecupPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- optimal policy
- reinforcement learning
- model based reinforcement learning
- macro actions
- learning tasks
- learning algorithm
- state space
- infinite horizon
- partially observable
- policy iteration
- transition matrices
- data mining
- decision processes
- finite state
- dynamic programming
- markov decision process
- average cost
- stochastic games
- reward function