Reward Constrained Policy Optimization.
Chen TesslerDaniel J. MankowitzShie MannorPublished in: ICLR (Poster) (2019)
Keyphrases
- concave convex procedure
- partially observable environments
- average reward
- optimization process
- reward function
- long run
- global optimization
- optimization algorithm
- optimization problems
- direct search
- inverse reinforcement learning
- optimal policy
- optimization method
- genetic algorithm
- saddle point
- search space
- reinforcement learning