Dueling RL: Reinforcement Learning with Trajectory Preferences.
Aldo PacchianoAadirupa SahaJonathan LeePublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- function approximation
- reinforcement learning algorithms
- state space
- model free
- temporal difference
- user preferences
- markov decision processes
- learning algorithm
- rl algorithms
- optimal policy
- dynamic programming
- multi agent
- transfer learning
- direct policy search
- optimal control
- control problems
- policy iteration
- continuous state
- machine learning
- action space
- markov decision process
- trajectory data
- learning process
- learning problems
- reinforcement learning methods
- markov decision problems
- autonomous learning
- policy search
- continuous state and action spaces