Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations.
Albert WilcoxAshwin BalakrishnaJules DedieuWyame BenslimaneDaniel S. BrownKen GoldbergPublished in: CoRR (2022)
Keyphrases
- monte carlo
- actor critic
- reinforcement learning
- policy gradient
- temporal difference
- variance reduction
- reinforcement learning algorithms
- average reward
- markov chain
- function approximation
- policy iteration
- importance sampling
- state space
- approximate dynamic programming
- temporal difference learning
- learning algorithm
- optimal control
- markov decision processes
- model free
- particle filter
- optimal policy
- function approximators
- machine learning
- multi agent
- policy evaluation
- sparse representation
- control problems
- gradient method
- rl algorithms
- partially observable markov decision processes
- neuro fuzzy
- state action
- partially observable
- dynamic programming
- active learning
- optimal solution
- decision problems
- long run