Regret-based Defense in Adversarial Reinforcement Learning.
Roman BelairePradeep VarakanthamThanh Hong NguyenDavid LoPublished in: AAMAS (2024)
Keyphrases
- reinforcement learning
- multi agent
- reward function
- total reward
- online learning
- reinforcement learning algorithms
- lower bound
- function approximation
- intrusion detection
- temporal difference
- state space
- learning algorithm
- model free
- worst case
- markov decision processes
- robotic control
- machine learning
- loss function
- network security
- binary classification
- expert advice
- reward signal
- reinforcement learning methods
- temporal difference learning
- learning problems
- optimal policy
- multiple agents
- optimal control
- supervised learning
- policy search
- upper bound
- support vector
- neural network