Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces.
Guy LorberbomChris J. MaddisonNicolas HeessTamir HazanDaniel TarlowPublished in: NeurIPS (2020)
Keyphrases
- continuous action
- policy search
- action space
- control policies
- direct optimization
- continuous state
- continuous state spaces
- optimal policy
- state space
- reinforcement learning
- partially observable markov decision processes
- markov decision process
- markov decision processes
- markov decision problems
- learning to rank
- real valued
- state and action spaces
- optimization methods
- action selection
- dynamic programming
- reinforcement learning algorithms
- stochastic processes
- reward function
- sparse pca
- finite state
- belief state
- state dependent
- control policy
- machine learning
- evaluation measures
- decision problems
- single agent
- policy gradient
- random variables
- search algorithm
- optimal solution
- decision making
- genetic algorithm