Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems.
Sushmita BhattacharyaSahil BadyalThomas WheelerStephanie GilDimitri P. BertsekasPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- policy iteration
- markov decision processes
- markov decision process
- optimal policy
- policy evaluation
- model free
- markov decision problems
- partially observable markov decision processes
- control problems
- state space
- reinforcement learning methods
- decision problems
- finite state
- approximate policy iteration
- action space
- optimal control
- function approximation
- multi agent
- temporal difference
- least squares
- function approximators
- continuous state
- average reward
- partially observable
- supervised learning
- temporal difference learning
- stochastic approximation
- control system
- markov games
- dynamic programming
- learning algorithm