End-to-End Policy Gradient Method for POMDPs and Explainable Agents.
Soichiro NishimoriSotetsu KoyamadaShin IshiiPublished in: CoRR (2023)
Keyphrases
- end to end
- gradient method
- policy gradient
- actor critic
- admission control
- convergence rate
- multi agent
- expected reward
- step size
- optimization methods
- multiple agents
- partially observable markov decision processes
- action selection
- negative matrix factorization
- optimal policy
- reinforcement learning
- single agent
- congestion control
- partially observable
- neural network
- markov decision processes
- real world
- multiresolution
- transport layer