Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning.
Junqi QianPaul WengChenmien TanPublished in: AAMAS (2023)
Keyphrases
- reinforcement learning
- learning algorithm
- learning process
- state space
- reinforcement learning algorithms
- learning problems
- function approximation
- markov decision processes
- mobile learning
- learning tasks
- optimal policy
- optimal control
- online learning
- inductive inference
- model free
- action selection
- supervised learning
- partially observable
- deep learning
- state action
- evolutionary learning
- state abstraction