Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies.
Sébastien GrosMario ZanonPublished in: ACC (2021)
Keyphrases
- gradient method
- bias correction
- policy gradient
- policy search
- optimal policy
- actor critic
- reinforcement learning
- markov decision process
- convergence rate
- negative matrix factorization
- step size
- state space
- markov decision processes
- reward function
- optimization methods
- markov decision problems
- partially observable markov decision processes
- intensity inhomogeneity
- semi supervised
- policy iteration
- control policy
- reinforcement learning algorithms
- average reward
- optimal control
- sample selection
- temporal difference
- machine learning