Expectation Maximization for Average Reward Decentralized POMDPs.
Joni PajarinenJaakko PeltonenPublished in: ECML/PKDD (1) (2013)
Keyphrases
- average reward
- expectation maximization
- partially observable markov decision processes
- markov decision processes
- dec pomdps
- em algorithm
- optimal policy
- long run
- reinforcement learning
- infinite horizon
- distributed constraint optimization
- stochastic games
- multi agent
- finite state
- optimality criterion
- semi markov decision processes
- probabilistic model
- maximum likelihood
- policy gradient
- generative model
- decision problems
- state space
- dynamic programming
- model free
- policy iteration
- total reward
- markov chain
- dynamical systems
- planning under uncertainty
- image segmentation
- belief state
- discounted reward
- hierarchical reinforcement learning
- state action
- sufficient conditions
- partially observable
- state and action spaces
- monte carlo
- decision theoretic
- actor critic
- supervised learning
- average cost
- reinforcement learning algorithms
- fixed point
- machine learning