Learning Reward Machines for Partially Observable Reinforcement Learning.
Rodrigo Toro IcarteEthan WaldieToryn Q. KlassenRichard Anthony ValenzanoMargarita P. CastroSheila A. McIlraithPublished in: NeurIPS (2019)
Keyphrases
- reinforcement learning
- partially observable
- partially observable environments
- state space
- inverse reinforcement learning
- partially observable domains
- hidden state
- markov decision processes
- learning process
- reward function
- learning algorithm
- action models
- function approximation
- dynamical systems
- multi agent
- partial observability
- temporal difference
- optimal policy
- markov decision problems
- learning agent
- decision problems
- partial observations
- reinforcement learning algorithms
- machine learning
- continuous state
- markov chain
- action selection
- learning tasks
- average reward
- state action
- probabilistic model
- learning capabilities
- dynamic programming
- model free
- hidden markov models
- orders of magnitude
- transfer learning