OSSP-PTA: An Online Stochastic Stepping Policy for PTA on Reinforcement Learning.
Dan NiuYichao DongZhou JinChuan ZhangQi LiChangyin SunPublished in: IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. (2023)
Keyphrases
- reinforcement learning
- optimal policy
- control policies
- direct policy search
- continuous state spaces
- model free reinforcement learning
- markov decision processes
- online learning
- policy search
- stochastic approximation
- policy iteration algorithm
- state space
- action space
- reinforcement learning algorithms
- state dependent
- action selection
- policy gradient
- markov decision process
- state and action spaces
- function approximation
- reinforcement learning problems
- partially observable environments
- policy iteration
- partially observable
- real time
- learning algorithm
- markov decision problems
- dynamic programming
- machine learning
- balancing exploration and exploitation
- approximate dynamic programming
- rl algorithms
- function approximators
- finite horizon
- control problems
- infinite horizon
- multi agent