Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error.
Bumgeun ParkTaeyoung KimWoohyeon MoonSarvar Hussain NengrooDongsoo HarPublished in: ICIC (5) (2023)
Keyphrases
- loss function
- temporal difference
- reinforcement learning
- function approximation
- td learning
- evaluation function
- pairwise
- reinforcement learning algorithms
- monte carlo
- temporal difference learning
- model free
- action selection
- function approximators
- policy evaluation
- temporal difference methods
- support vector
- step size
- actor critic
- state space
- dynamic programming
- optimal policy
- policy iteration
- supervised learning
- reinforcement learning problems
- learning algorithm
- evolutionary algorithm
- data mining
- reinforcement learning methods
- policy search
- machine learning