DOPE: Doubly Optimistic and Pessimistic Exploration for Safe Reinforcement Learning.
Archana BuraAria HasanzadeZonuzyDileep KalathilSrinivas ShakkottaiJean-François ChamberlandPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- active exploration
- exploration strategy
- action selection
- model based reinforcement learning
- autonomous learning
- function approximation
- exploration exploitation
- model free
- state space
- supervised learning
- optimal control
- exploration exploitation tradeoff
- reinforcement learning algorithms
- transfer learning
- markov decision processes
- learning problems
- optimal policy
- multi agent
- dynamic programming
- learning process
- machine learning
- balancing exploration and exploitation
- information retrieval
- data structure
- active learning
- temporal difference learning
- action space
- possibility theory
- partially observable
- learning capabilities
- temporal difference
- least squares
- learning classifier systems
- data sets