Preference-based Reinforcement Learning with Finite-Time Guarantees.
Yichong XuRuosong WangLin F. YangAarti SinghArtur DubrawskiPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- state and action spaces
- function approximation
- model free
- robotic control
- temporal difference
- finite number
- optimal policy
- markov decision processes
- machine learning
- learning capabilities
- user preferences
- multi agent
- learning algorithm
- optimal control
- transfer learning
- logic programs
- finite automata
- multi agent reinforcement learning
- data sets
- unit length