Protecting Reward Function of Reinforcement Learning via Minimal and Non-catastrophic Adversarial Trajectory.
Tong ChenYingxiao XiangYike LiYunzhe TianEndong TongWenjia NiuJiqiang LiuGang LiQi Alfred ChenPublished in: SRDS (2021)
Keyphrases
- reward function
- reinforcement learning
- reinforcement learning algorithms
- markov decision processes
- state space
- optimal policy
- multi agent
- policy search
- partially observable
- multiple agents
- transition model
- function approximation
- markov decision process
- inverse reinforcement learning
- transition probabilities
- initially unknown
- learning agent
- state variables
- markov chain
- dynamic programming
- bayesian networks
- machine learning
- hierarchical reinforcement learning
- state action
- dynamical systems
- generative model
- learning algorithm