SoftTreeMax: Policy Gradient with Tree Search.

Gal Dalal Assaf Hallak Shie Mannor Gal Chechik

Published in: CoRR (2022)

Keyphrases

tree search
policy gradient
branch and bound
search algorithm
reinforcement learning
search tree
optimal control
constraint propagation
reinforcement learning algorithms
gradient method
state space
function approximation
path finding
mathematical programming
approximation methods
single agent
variance reduction
game tree
orders of magnitude
reinforcement learning methods
average reward
partially observable markov decision processes
search space
convergence rate
monte carlo
model free
simulated annealing
neural network