Long-Term Visitation Value for Deep Exploration in Sparse-Reward Reinforcement Learning.
Simone ParisiDavide TateoMaximilian HenselCarlo D'EramoJan PetersJoni PajarinenPublished in: Algorithms (2022)
Keyphrases
- reinforcement learning
- long term
- exploration strategy
- exploration exploitation
- action selection
- short term
- active exploration
- function approximation
- state space
- eligibility traces
- reinforcement learning algorithms
- optimal policy
- model free
- sparse data
- temporal difference
- reward function
- robotic control
- exploration exploitation tradeoff
- learning algorithm
- sparse representation
- optimal control
- learning problems
- model based reinforcement learning
- deep learning
- learning agent
- partially observable environments
- learning process
- balancing exploration and exploitation
- markov decision processes
- compressed sensing
- multi agent
- high dimensional
- decision problems
- partially observable
- total reward
- policy gradient
- state and action spaces
- transfer learning
- autonomous learning
- sparse matrix
- average reward
- markov decision process
- machine learning
- sparse coding