Sample-Efficient Reinforcement Learning Based on Dynamics Models via Meta-policy Optimization.
Guoyu ZuoZhipeng TianShuai HuangDaoxiong GongPublished in: ICCSIP (2021)
Keyphrases
- reinforcement learning
- optimization algorithm
- optimal policy
- probabilistic model
- complex systems
- control policies
- statistical models
- function approximation
- policy search
- sample size
- action selection
- dynamic model
- response surface
- genetic algorithm
- markov decision process
- infinite horizon
- optimization method
- experimental data
- model selection
- learning process
- multi agent