Policy Gradient Algorithms with Monte-Carlo Tree Search for Non-Markov Decision Processes.
Tetsuro MorimuraKazuhiro OtaKenshi AbePeinan ZhangPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- policy iteration
- reinforcement learning
- reinforcement learning algorithms
- state space
- optimal policy
- reinforcement learning methods
- dynamic programming
- partially observable markov decision processes
- finite state
- learning algorithm
- monte carlo
- average reward
- computational complexity
- actor critic
- monte carlo tree search
- sufficient conditions
- policy gradient