No-Regret Exploration in Goal-Oriented Reinforcement Learning.
Jean TarbouriechEvrard GarcelonMichal ValkoMatteo PirottaAlessandro LazaricPublished in: ICML (2020)
Keyphrases
- goal oriented
- reinforcement learning
- exploration exploitation
- active exploration
- action selection
- bandit problems
- exploration strategy
- model based reinforcement learning
- total reward
- decentralized control
- process oriented
- online learning
- function approximation
- balancing exploration and exploitation
- requirements analysis
- state space
- markov decision processes
- active learning
- reward function
- confidence bounds
- temporal difference
- data marts
- machine learning
- requirements engineering
- multi armed bandit
- optimal control
- exploration exploitation tradeoff
- worst case
- authoring environment
- optimal policy
- minimax regret
- reinforcement learning algorithms
- autonomous learning
- loss function
- dynamic programming
- lower bound
- multi agent
- artificial intelligence
- data mining
- expert advice