One Policy is Enough: Parallel Exploration with a Single Policy is Near-Optimal for Reward-Free Reinforcement Learning.
Pedro Cisneros-VelardeBoxiang LyuSanmi KoyejoMladen KolarPublished in: AISTATS (2023)
Keyphrases
- reinforcement learning
- optimal policy
- action selection
- partially observable environments
- reward function
- policy search
- markov decision process
- control policy
- inverse reinforcement learning
- average reward
- policy gradient
- total reward
- reinforcement learning problems
- state action
- markov decision problems
- approximate dynamic programming
- policy making
- partially observable
- long run
- state space
- markov decision processes
- actor critic
- control policies
- state and action spaces
- finite horizon
- policy iteration
- reinforcement learning algorithms
- function approximation
- parallel processing
- policy makers
- partially observable markov decision processes
- asymptotically optimal
- temporal difference
- learning process
- infinite horizon
- finite state
- discounted reward
- machine learning