No-regret Exploration in Contextual Reinforcement Learning.
Aditya ModiAmbuj TewariPublished in: UAI (2020)
Keyphrases
- reinforcement learning
- exploration exploitation
- active exploration
- bandit problems
- exploration strategy
- action selection
- total reward
- autonomous learning
- model based reinforcement learning
- contextual information
- online learning
- reward function
- markov decision processes
- exploration exploitation tradeoff
- function approximation
- lower bound
- active learning
- state space
- balancing exploration and exploitation
- model free
- reinforcement learning algorithms
- machine learning
- context sensitive
- context dependent
- robotic control
- expert advice
- worst case
- minimax regret
- dynamic programming
- weighted majority
- objective function
- supervised learning
- context aware
- temporal difference
- learning problems
- binary classification
- optimal control