Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions.
Tom BewleyFreddy LécuéPublished in: CoRR (2021)
Keyphrases
- reward function
- reinforcement learning
- markov decision processes
- reinforcement learning algorithms
- policy search
- optimal policy
- state space
- markov decision process
- inverse reinforcement learning
- partially observable
- function approximation
- multiple agents
- transition model
- markov decision problems
- state action
- transition probabilities
- state variables
- temporal difference
- learning algorithm
- initially unknown
- structured data
- higher order
- multi agent
- finite state
- dynamic programming
- multi agent systems
- optimal solution