Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning.
Bálint KöváriBálint PelenczeiTamás BécsiPublished in: SACI (2022)
Keyphrases
- reinforcement learning
- reward function
- monte carlo tree search
- bayesian reinforcement learning
- reinforcement learning methods
- optimal policy
- reinforcement learning algorithms
- temporal difference
- policy search
- monte carlo
- markov decision processes
- temporal difference learning
- state space
- markov decision process
- inverse reinforcement learning
- partially observable
- function approximation
- multiple agents
- evaluation function
- multi agent
- machine learning
- model free
- markov decision problems
- control problems
- dynamic programming
- supervised learning
- markov chain
- decision problems
- learning process
- game playing