Actor-Critic with Variable Time Discretization via Sustained Actions.
Jakub LyskawaPawel WawrzynskiPublished in: ICONIP (1) (2023)
Keyphrases
- actor critic
- reinforcement learning
- policy gradient
- optimal control
- approximate dynamic programming
- reinforcement learning algorithms
- function approximation
- temporal difference
- gradient method
- continuous variables
- policy iteration
- planning problems
- action selection
- partially observable
- reward function
- action space
- step size
- dynamic programming