Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains.
Haolin SunYves LespérancePublished in: EUMAS (2023)
Keyphrases
- reinforcement learning
- policy search
- continuous action
- continuous state
- action space
- transfer learning
- partially observable markov decision processes
- reward function
- reinforcement learning algorithms
- function approximation
- state space
- markov decision processes
- optimal policy
- policy gradient
- model free
- temporal difference
- supervised learning
- machine learning
- multi agent
- function approximators
- average reward
- state action
- hidden state
- learning problems
- long run
- partially observable
- learning agent
- optimal control
- markov decision problems
- dynamic programming
- control strategies