Learning Interpretable Policies in Hindsight-Observable POMDPs through Partially Supervised Reinforcement Learning.
Michael LanierYing XuNathan JacobsChongjie ZhangYevgeniy VorobeychikPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- learning process
- policy gradient methods
- policy search
- optimal policy
- learning problems
- learning algorithm
- partially observable markov decision processes
- action selection
- state space
- model free
- partially supervised
- partially observable
- function approximation
- predictive state representations
- reinforcement learning algorithms
- data mining
- supervised learning
- natural actor critic
- decision trees
- function approximators
- high dimensional
- reward function
- active learning
- learning tasks
- markov decision processes