Reward-Machine-Guided, Self-Paced Reinforcement Learning.
Cevahir KöprülüUfuk TopcuPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- function approximation
- state space
- eligibility traces
- multi agent
- reinforcement learning algorithms
- reward function
- model free
- partially observable environments
- partially observable
- markov decision processes
- optimal control
- total reward
- learning algorithm
- learning agent
- neural network
- optimal policy
- action selection
- machine learning
- policy gradient
- single agent
- reward shaping
- policy evaluation
- batch processing
- average reward
- dynamical systems
- learning process
- dynamic programming
- mobile robot
- flowshop