A Predictive Reward Function for Human-Like Driving Based on a Transition Model of Surrounding Environment.
Daiki HayashiYunfei XuTakashi BandoKazuya TakedaPublished in: ICRA (2019)
Keyphrases
- transition model
- reward function
- initially unknown
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- multiple agents
- inverse reinforcement learning
- state transition
- partially observable
- dynamic environments
- learning agent
- transition probabilities
- long run
- reinforcement learning algorithms
- markov decision process