Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces.
Guy LorberbomChris J. MaddisonNicolas HeessTamir HazanDaniel TarlowPublished in: CoRR (2019)
Keyphrases
- action space
- control policies
- continuous action
- policy search
- direct optimization
- continuous state
- optimal policy
- state space
- continuous state spaces
- markov decision process
- markov decision processes
- partially observable markov decision processes
- reinforcement learning
- markov decision problems
- learning to rank
- state and action spaces
- real valued
- optimization methods
- stochastic processes
- evaluation measures
- sparse pca
- dynamic programming
- action selection
- finite state
- decision problems
- policy iteration
- reinforcement learning algorithms
- reward function
- infinite horizon
- average reward
- policy gradient
- control policy
- state dependent
- partially observable markov decision process
- machine learning
- decision making