Reinforcement Learning with Stochastic Reward Machines.
Jan CorazzaIvan GavranDaniel NeiderPublished in: AAAI (2022)
Keyphrases
- reinforcement learning
- direct policy search
- learning automata
- function approximation
- reinforcement learning algorithms
- control policies
- state space
- continuous state spaces
- learning problems
- control problems
- stochastic approximation
- optimal policy
- reward function
- multi armed bandit
- temporal difference
- eligibility traces
- partially observable environments
- markov decision processes
- learning algorithm
- model free
- monte carlo
- average reward
- learning classifier systems
- approximate dynamic programming
- stochastic processes
- reinforcement learning methods
- temporal difference learning
- learning agent
- machine learning
- state transition
- optimal control
- transfer learning