Unified Algorithms for RL with Decision-Estimation Coefficients: No-Regret, PAC, and Reward-Free Learning.

Fan Chen Song Mei Yu Bai

Published in: CoRR (2022)

Keyphrases

reinforcement learning
learning algorithm
noise tolerant
learning process
online learning
reinforcement learning methods
learning agent
multi armed bandit
machine learning
bandit problems
inverse reinforcement learning
learning tasks
active learning
learning models
machine learning algorithms
worst case
supervised learning
learning problems
learning classifier systems
reward function
function approximators
decision makers
rl algorithms
autonomous learning
lower bound