Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces.
Ricardo QuinteiroFrancisco S. MeloPedro A. SantosPublished in: CoRR (2021)
Keyphrases
- monte carlo
- action space
- state space
- markov chain
- optimal strategy
- action selection
- planning problems
- markov decision processes
- real valued
- heuristic search
- continuous action
- monte carlo simulation
- continuous state
- single agent
- reinforcement learning
- importance sampling
- stochastic processes
- quasi monte carlo
- particle filter
- partially observable
- temporal difference
- monte carlo methods
- matrix inversion
- monte carlo tree search
- markov decision process
- dynamic programming
- reinforcement learning methods
- transition probabilities
- multi agent
- search space
- multiple agents
- decision theoretic
- variance reduction
- function approximators
- markov decision problems
- probability distribution
- function approximation
- finite state
- machine learning