Learning reward machines: A study in partially observable reinforcement learning.
Rodrigo Toro IcarteToryn Q. KlassenRichard ValenzanoMargarita P. CastroEthan WaldieSheila A. McIlraithPublished in: Artif. Intell. (2023)
Keyphrases
- reinforcement learning
- partially observable
- partially observable environments
- learning algorithm
- state space
- hidden state
- learning process
- markov decision processes
- action models
- partially observable domains
- inverse reinforcement learning
- function approximation
- partial observations
- reinforcement learning algorithms
- dynamical systems
- reward function
- decision problems
- model free
- partial observability
- state action
- markov decision problems
- machine learning
- multi agent
- dynamic programming
- probabilistic model
- optimal policy
- orders of magnitude
- function approximators
- policy iteration
- initially unknown
- temporal difference
- planning domains
- optimal control