Hierarchical Policy for Non-prehensile Multi-object Rearrangement with Deep Reinforcement Learning and Monte Carlo Tree Search.
Fan BaiFei MengJianbang LiuJiankun WangMax Q.-H. MengPublished in: CoRR (2021)
Keyphrases
- multi object
- monte carlo tree search
- bayesian reinforcement learning
- reinforcement learning
- optimal policy
- temporal difference
- reinforcement learning methods
- temporal difference learning
- monte carlo
- markov decision process
- monte carlo search
- action selection
- multiple objects
- human perception
- evaluation function
- markov decision processes
- reinforcement learning algorithms
- policy iteration
- partially observable markov decision processes
- function approximators
- function approximation
- state space
- coarse to fine
- machine learning
- reward function
- action space
- decision problems
- average reward
- learning algorithm
- infinite horizon
- statistical shape model
- dynamic programming
- fixed point
- partially observable
- model free
- control problems
- motion trajectories
- multimedia
- optimal strategy