Provable Defense against Backdoor Policies in Reinforcement Learning.
Shubham Kumar BhartiXuezhou ZhangAdish SinglaXiaojin ZhuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- control policies
- markov decision process
- fitted q iteration
- control policy
- markov decision processes
- state space
- reward function
- function approximation
- network security
- markov decision problems
- intrusion detection
- partially observable markov decision processes
- policy gradient methods
- reinforcement learning agents
- decision problems
- hierarchical reinforcement learning
- reinforcement learning algorithms
- model free
- temporal difference learning
- continuous state
- total reward
- dynamic programming
- long run
- machine learning
- function approximators
- learning problems
- learning process
- tabula rasa
- decentralized control
- advanced research projects agency
- management policies
- multi agent reinforcement learning
- policy iteration
- average cost
- action selection
- infinite horizon
- finite state
- sufficient conditions
- supervised learning