Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game.
Wei XiongHan ZhongChengshuai ShiCong ShenLiwei WangTong ZhangPublished in: ICLR (2023)
Keyphrases
- reinforcement learning
- function approximation
- single agent
- temporal difference learning algorithms
- multi agent
- function approximators
- stochastic games
- markov decision processes
- state space
- action space
- optimal policy
- average reward
- dynamic programming
- temporal difference
- optimal control
- temporal difference learning
- model free
- reinforcement learning algorithms
- markov decision process
- control policies
- approximate dynamic programming
- policy gradient
- decision problems
- markov chain
- control policy
- average cost
- learning agent
- markov decision problems
- action selection
- partially observable
- learning algorithm
- multi agent systems
- dynamic environments
- neural network
- multiple agents
- state action
- continuous state
- optimal solution
- partially observable markov decision processes
- long run
- learning process
- search space
- reinforcement learning methods
- search algorithm
- cooperative
- policy search
- machine learning