Scalable solutions of interactive POMDPs using generalized and bounded policy iteration.

Ekhlas Sonu Prashant Doshi

Published in: Auton. Agents Multi Agent Syst. (2015)

Keyphrases

policy iteration
markov decision processes
reinforcement learning
optimal policy
markov decision problems
policy iteration algorithm
finite state
model free
average reward
markov decision process
sample path
state space
fixed point
infinite horizon
policy evaluation
dynamic programming
optimal control
optimal solution
partially observable
partially observable markov decision processes
reinforcement learning algorithms
least squares
belief state
actor critic
temporal difference
neural network