Optimism and Delays in Episodic Reinforcement Learning.
Benjamin HowsonCiara Pike-BurkeSarah FilippiPublished in: AISTATS (2023)
Keyphrases
- reinforcement learning
- multi agent
- machine learning
- reinforcement learning algorithms
- function approximation
- model free
- state space
- robotic control
- search algorithm
- optimal policy
- markov decision processes
- reward function
- temporal difference
- evolutionary learning
- continuous state
- multi agent reinforcement learning
- policy search
- database
- direct policy search
- autonomous learning
- action selection
- evolutionary algorithm
- learning algorithm
- real world