Preference-based Reinforcement Learning with Finite-Time Guarantees.
Yichong XuRuosong WangLin F. YangAarti SinghArtur DubrawskiPublished in: NeurIPS (2020)
Keyphrases
- reinforcement learning
- state and action spaces
- function approximation
- state space
- markov decision processes
- reinforcement learning algorithms
- model free
- reinforcement learning methods
- temporal difference learning
- finite number
- optimal control
- temporal difference
- transfer learning
- machine learning
- data sets
- optimal policy
- user preferences
- active learning
- learning environment
- learning capabilities
- partially observable
- function approximators
- theoretical guarantees
- finite automata
- objective function
- policy search
- data mining