Posterior Sampling for Competitive RL: Function Approximation and Partial Observation.

Shuang Qiu Ziyu Dai Han Zhong Zhaoran Wang Zhuoran Yang Tong Zhang

Published in: CoRR (2023)

Keyphrases

function approximation
reinforcement learning
tile coding
temporal difference
temporal difference learning
model free
temporal difference learning algorithms
radial basis function
learning tasks
td learning
probability distribution
reinforcement learning algorithms
exploration exploitation tradeoff
temporal difference methods
optimal policy
td methods
policy evaluation
multi agent
single agent
neural network
state space