Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error.
Bumgeun ParkTaeyoung KimWoohyeon MoonLuiz Felipe VecchiettiDongsoo HarPublished in: CoRR (2022)
Keyphrases
- loss function
- temporal difference
- reinforcement learning
- function approximation
- td learning
- reinforcement learning algorithms
- pairwise
- evaluation function
- model free
- monte carlo
- temporal difference learning
- support vector
- action selection
- step size
- temporal difference methods
- state space
- function approximators
- policy evaluation
- actor critic
- learning algorithm
- machine learning
- policy iteration
- markov decision processes
- collaborative filtering
- supervised learning
- least squares
- learning process