Posterior Sampling for Competitive RL: Function Approximation and Partial Observation.
Shuang QiuZiyu DaiHan ZhongZhaoran WangZhuoran YangTong ZhangPublished in: CoRR (2023)
Keyphrases
- function approximation
- reinforcement learning
- tile coding
- temporal difference
- temporal difference learning
- model free
- temporal difference learning algorithms
- radial basis function
- learning tasks
- td learning
- probability distribution
- reinforcement learning algorithms
- exploration exploitation tradeoff
- temporal difference methods
- optimal policy
- td methods
- policy evaluation
- multi agent
- single agent
- neural network
- state space