Model-free Policy Learning with Reward Gradients.
Qingfeng LanA. Rupam MahmoodPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- model free
- learning process
- learning tasks
- learning algorithm
- training data
- partially observable environments
- decision trees
- prior knowledge
- machine learning
- state action
- temporal difference
- supervised learning
- inverse reinforcement learning
- average reward
- policy iteration
- optimal policy
- artificial neural networks
- genetic algorithm