No-Regret Exploration in Goal-Oriented Reinforcement Learning.
Jean TarbouriechEvrard GarcelonMichal ValkoMatteo PirottaAlessandro LazaricPublished in: CoRR (2019)
Keyphrases
- goal oriented
- reinforcement learning
- exploration exploitation
- exploration strategy
- active exploration
- bandit problems
- action selection
- model based reinforcement learning
- total reward
- balancing exploration and exploitation
- function approximation
- requirements analysis
- reward function
- online learning
- decentralized control
- lower bound
- data marts
- markov decision processes
- active learning
- loss function
- optimal policy
- model free
- state space
- process oriented
- exploration exploitation tradeoff
- worst case
- multi armed bandit
- reinforcement learning algorithms
- requirements engineering
- multi agent
- machine learning
- autonomous learning
- expert advice
- databases
- temporal difference
- learning problems
- decision problems
- knowledge acquisition
- support vector
- learning algorithm