On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game.
Shuang QiuJieping YeZhaoran WangZhuoran YangPublished in: ICML (2021)
Keyphrases
- single agent
- reinforcement learning
- policy gradient
- stochastic games
- multi agent
- markov decision processes
- action space
- average reward
- state action
- reward function
- multiple agents
- optimal policy
- learning agent
- decision problems
- state space
- function approximators
- learning agents
- approximation methods
- multi agent systems
- multi agent coordination
- markov decision process
- reinforcement learning algorithms
- two player games
- dynamic environments
- partially observable markov decision processes
- model free
- control policy
- window search
- long run
- dynamic programming
- exploration strategy
- dec pomdps
- temporal difference
- discounted reward
- continuous state
- function approximation
- agent learns
- total reward
- markov chain
- game theory
- inverse reinforcement learning
- nash equilibrium
- state and action spaces
- finite state
- markov decision problems
- sufficient conditions
- efficient computation
- nash equilibria