A Provably Efficient Model-Free Posterior Sampling Method for Episodic Reinforcement Learning.
Christoph DannMehryar MohriTong ZhangJulian ZimmertPublished in: NeurIPS (2021)
Keyphrases
- model free
- reinforcement learning
- reinforcement learning algorithms
- function approximation
- temporal difference
- policy iteration
- learning algorithm
- impedance control
- multi agent
- average reward
- reinforcement learning methods
- machine learning
- optimal control
- optimal policy
- adaptive control
- state space
- dynamic programming