Cyclic policy distillation: Sample-efficient sim-to-real reinforcement learning with domain randomization.
Yuki KadokawaLingwei ZhuYoshihisa TsurumineTakamitsu MatsubaraPublished in: Robotics Auton. Syst. (2023)
Keyphrases
- reinforcement learning
- optimal policy
- partially observable domains
- policy search
- multi agent
- learning process
- domain independent
- state space
- domain specific
- sufficient conditions
- computationally efficient
- real world
- privacy preserving
- domain experts
- action selection
- transition model
- inverse reinforcement learning
- partially observable environments
- data sets