Provable Defense against Backdoor Policies in Reinforcement Learning.
Shubham Kumar BhartiXuezhou ZhangAdish SinglaJerry ZhuPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- control policies
- markov decision process
- reward function
- state space
- partially observable markov decision processes
- fitted q iteration
- markov decision problems
- function approximation
- reinforcement learning algorithms
- policy gradient methods
- reinforcement learning agents
- hierarchical reinforcement learning
- decision problems
- markov decision processes
- intrusion detection
- total reward
- temporal difference
- control policy
- temporal difference learning
- model free
- machine learning
- learning process
- partially observable
- long run
- average reward
- continuous state
- supervised learning
- neural network
- infinite horizon
- learning agents
- transfer learning
- function approximators
- dynamic programming