Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs.
Kaixuan JiQingyue ZhaoJiafan HeWeitong ZhangQuanquan GuPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- multi agent
- markov decision process
- function approximation
- state and action spaces
- continuous state and action spaces
- function approximators
- mixture model
- reinforcement learning algorithms
- policy iteration
- partially observable
- action sets
- dynamic programming
- control problems
- finite state
- action space
- factored markov decision processes
- policy search
- transition model
- approximate dynamic programming
- markov decision problems
- average reward
- action selection
- planning under uncertainty
- continuous state
- reinforcement learning methods
- control policy
- factored mdps
- finite horizon
- stochastic processes
- reward function