Model-free Policy Learning with Reward Gradients.
Qingfeng LanSamuele TosattoHomayoon FarrahiRupam MahmoodPublished in: AISTATS (2022)
Keyphrases
- reinforcement learning
- model free
- learning process
- partially observable environments
- learning algorithm
- optimal policy
- action selection
- neural network
- least squares
- policy gradient
- text classification
- k nearest neighbor
- learning tasks
- learning problems
- decision trees
- adaptive control
- temporal difference
- policy iteration
- state action