Private Reinforcement Learning with PAC and Regret Guarantees.
Giuseppe VietriBorja BalleAkshay KrishnamurthyZhiwei Steven WuPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- total reward
- lower bound
- reward function
- online learning
- function approximation
- reinforcement learning algorithms
- temporal difference
- privacy preserving
- state space
- upper bound
- learning algorithm
- private data
- loss function
- expert advice
- optimal policy
- worst case
- sample complexity
- sample size
- supervised learning
- model free
- vc dimension
- game theory
- markov decision processes
- learning process
- machine learning
- optimal control
- multi class
- mistake bound
- bandit problems
- multi armed bandit
- confidence bounds
- reward signal
- multi armed bandit problems