Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions.
Tom BewleyFreddy LécuéPublished in: AAMAS (2022)
Keyphrases
- reward function
- reinforcement learning
- reinforcement learning algorithms
- markov decision processes
- policy search
- state space
- optimal policy
- markov decision process
- partially observable
- inverse reinforcement learning
- transition model
- state action
- multiple agents
- function approximation
- learning algorithm
- initially unknown
- temporal difference
- structured data
- control policies
- dynamic programming
- machine learning
- model free
- learning agent
- multi agent
- preference relations
- generative model
- continuous state
- hidden markov models