Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients.
Baturay SaglamFurkan B. MutluDogan C. CicekSuleyman Serdar KozatPublished in: Neural Process. Lett. (2024)
Keyphrases
- parameter free
- reinforcement learning
- optimal policy
- categorical data
- policy search
- action selection
- outlier detection
- markov decision process
- partially observable domains
- control policy
- function approximation
- markov decision processes
- state space
- function approximators
- markov decision problems
- partially observable
- partially observable environments
- deterministic domains
- reward function
- control policies
- fully observable
- databases
- temporal difference
- fully automatic
- partially observable markov decision processes
- reinforcement learning algorithms
- fluid model
- policy gradient
- reinforcement learning problems
- machine learning
- rl algorithms
- policy evaluation
- continuous state spaces
- infinite horizon
- agent learns
- state and action spaces
- learning algorithm
- data sets