Model-free Policy Learning with Reward Gradients.

Qingfeng Lan Samuele Tosatto Homayoon Farrahi Rupam Mahmood

Published in: AISTATS (2022)

Keyphrases

reinforcement learning
model free
learning process
partially observable environments
learning algorithm
optimal policy
action selection
neural network
least squares
policy gradient
text classification
k nearest neighbor
learning tasks
learning problems
decision trees
adaptive control
temporal difference
policy iteration
state action