Private Reinforcement Learning with PAC and Regret Guarantees.
Giuseppe VietriBorja BalleAkshay KrishnamurthyZhiwei Steven WuPublished in: ICML (2020)
Keyphrases
- reinforcement learning
- total reward
- function approximation
- online learning
- reward function
- learning algorithm
- reinforcement learning algorithms
- lower bound
- sample complexity
- multi armed bandit
- expert advice
- private data
- optimal policy
- privacy preserving
- dynamic programming
- state space
- learning process
- upper bound
- confidence bounds
- model free
- temporal difference
- worst case
- machine learning
- multi agent
- pac learning
- optimal control
- learning problems
- bandit problems
- reward signal
- noise tolerant
- theoretical guarantees
- markov decision process
- vc dimension
- transfer learning