Model-based Reinforcement Learning with Scalable Composite Policy Gradient Estimators.
Paavo ParmasTakuma SenoYuma AokiPublished in: ICML (2023)
Keyphrases
- gradient estimators
- model based reinforcement learning
- markov decision processes
- optimal policy
- markov decision problems
- policy iteration
- reinforcement learning
- partially observable markov decision processes
- infinite horizon
- partially observable
- finite state
- average cost
- markov decision process
- reward function
- action space
- linear programming
- state space
- action selection
- edge detector
- belief revision