On The Fragility of Learned Reward Functions.
Lev McKinneyYawen DuanDavid KruegerAdam GleavePublished in: CoRR (2023)
Keyphrases
- reward function
- markov decision processes
- state space
- reinforcement learning
- inverse reinforcement learning
- reinforcement learning algorithms
- multiple agents
- simple examples
- optimal policy
- policy search
- decision makers
- markov decision process
- markov decision problems
- random walk
- information extraction
- state action
- initially unknown