Unsupervised Domain Adaptation with Dynamics-Aware Rewards in Reinforcement Learning.
Jinxin LiuHao ShenDonglin WangYachen KangQiangxing TianPublished in: NeurIPS (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- function approximation
- reinforcement learning algorithms
- optimal policy
- state space
- learning algorithm
- reward function
- model free
- dynamic model
- multi agent
- learning process
- optimal control
- temporal difference
- action selection
- data sets
- dynamical systems
- dynamic programming
- transfer learning
- markov decision process
- control policy
- temporal difference learning
- learning capabilities
- function approximators
- learning agents
- policy search
- total reward