Near-optimal Policy Optimization Algorithms for Learning Adversarial Linear Mixture MDPs.
Jiafan HeDongruo ZhouQuanquan GuPublished in: AISTATS (2022)
Keyphrases
- optimal policy
- reinforcement learning
- markov decision processes
- learning algorithm
- policy iteration
- state space
- finite horizon
- optimization problems
- dynamic programming
- dynamic programming algorithms
- markov decision process
- infinite horizon
- average reward reinforcement learning
- reinforcement learning methods
- function approximators
- partially observable markov decision processes
- decision problems
- control policies
- optimal solution
- policy evaluation
- long run
- model free
- cost function
- multistage
- multi agent
- bayesian reinforcement learning
- search space
- state dependent
- monte carlo
- average cost
- partially observable
- reward function
- temporal difference