Reinforcement Learning in Partially Observable Multiagent Settings: Monte Carlo Exploring Policies with PAC Bounds.
Roi CerenPrashant DoshiBikramjit BanerjeePublished in: AAMAS (2016)
Keyphrases
- partially observable
- monte carlo
- reinforcement learning
- markovian decision
- state space
- markov decision problems
- optimal policy
- reward function
- markov decision processes
- upper bound
- partially observable markov decision processes
- markov chain
- variance reduction
- markov decision process
- decision problems
- infinite horizon
- partially observable environments
- partial observability
- dynamical systems
- reinforcement learning algorithms
- partially observable domains
- lower bound
- action models
- monte carlo tree search
- temporal difference
- particle filter
- dynamic programming
- heuristic search
- function approximation
- importance sampling
- finite state
- transition probabilities
- worst case
- average cost
- learning algorithm
- policy evaluation
- belief state
- policy iteration
- planning problems
- orders of magnitude
- action space
- temporal difference learning
- multi agent
- long run
- model free
- average reward
- state variables
- branch and bound
- game tree