Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path.
Muhammad Aneeq uz ZamanAlec KoppelSujay BhattTamer BasarPublished in: CoRR (2022)
Keyphrases
- sample path
- reinforcement learning
- policy iteration
- asymptotic analysis
- serial inventory systems
- lost sales
- optimal policy
- markov decision processes
- markov chain
- function approximation
- average reward
- markov random field
- state space
- multi agent
- bayesian inference
- model free
- decision problems
- multi item
- monte carlo
- large deviations
- least squares