Policy improvement by planning with Gumbel.
Ivo DanihelkaArthur GuezJulian SchrittwieserDavid SilverPublished in: ICLR (2022)
Keyphrases
- action selection
- planning problems
- optimal policy
- significant improvement
- partially observable
- stochastic domains
- reinforcement learning problems
- planning process
- policy makers
- goal oriented
- heuristic search
- decision support
- ai planning
- asymptotically optimal
- partially observable markov decision processes
- mixed initiative
- domain independent
- policy making
- reinforcement learning