Regret Bounds for Kernel-Based Reinforcement Learning.
Omar Darwiche DominguesPierre MénardMatteo PirottaEmilie KaufmannMichal ValkoPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- regret bounds
- multi armed bandit
- online learning
- state space
- support vector machine
- lower bound
- learning process
- model free
- linear regression
- optimal policy
- learning algorithm
- temporal difference
- kernel methods
- e learning
- learning problems
- online convex optimization
- markov decision processes
- upper bound
- gaussian mixture
- probabilistic model
- support vector
- machine learning