Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization.
Yuki KadokawaLingwei ZhuYoshihisa TsurumineTakamitsu MatsubaraPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- optimal policy
- real life
- computationally efficient
- machine learning
- markov decision process
- policy search
- domain specific
- policy iteration
- markov decision processes
- domain experts
- partially observable environments
- action selection
- function approximation
- decision problems
- domain independent
- privacy preserving
- state space
- decision making