Online learning in MDPs with linear function approximation and bandit feedback.
Gergely NeuJulia OlkhovskayaPublished in: CoRR (2020)
Keyphrases
- function approximation
- online learning
- reinforcement learning
- temporal difference learning algorithms
- function approximators
- regret bounds
- markov decision processes
- temporal difference
- model free
- radial basis function
- temporal difference learning
- active learning
- policy evaluation
- policy search
- learning tasks
- multi agent
- e learning
- state space
- markov decision problems
- reinforcement learning algorithms
- optimal policy
- reinforcement learning problems
- temporal difference methods
- reinforcement learning methods
- finite state
- markov chain
- least squares