Reward Constrained Policy Optimization.
Chen TesslerDaniel J. MankowitzShie MannorPublished in: CoRR (2018)
Keyphrases
- concave convex procedure
- reinforcement learning
- partially observable environments
- optimization algorithm
- average reward
- long run
- lagrange multipliers
- constrained optimization
- global optimization
- optimization problems
- optimization methods
- neural network
- reward function
- optimal policy
- genetic algorithm
- expected cost
- partially observable
- optimization method
- discrete optimization
- policy gradient