Learning Two-Step Hybrid Policy for Graph-Based Interpretable Reinforcement Learning.
Tongzhou MuKaixiang LinFeiyang NiuGovind ThattaiPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- learning algorithm
- learning problems
- learning process
- prior knowledge
- action selection
- supervised learning
- optimal policy
- online learning
- partially observable environments
- machine learning
- actor critic
- evolutionary learning
- active learning
- learning systems
- dynamic programming
- multi agent
- learning phase
- policy gradient methods
- eligibility traces
- relational reinforcement learning
- policy search
- temporal difference learning
- markov decision process
- partially observable
- hybrid learning
- learning tasks
- markov decision processes