Robust Policy Optimization in Deep Reinforcement Learning.
Md Masudur RahmanYexiang XuePublished in: CoRR (2022)
Keyphrases
- simultaneous optimization
- reinforcement learning
- optimal policy
- policy search
- dynamic programming
- action selection
- markov decision processes
- machine learning
- action space
- markov decision process
- simulated annealing
- learning process
- model free
- reward function
- learning algorithm
- policy iteration
- state and action spaces
- partially observable environments
- actor critic
- approximate dynamic programming
- markov decision problems
- average reward
- control policy
- optimization algorithm
- state space
- multi objective
- genetic algorithm