Backprop-MPDM: Faster Risk-Aware Policy Evaluation Through Efficient Gradient Optimization.

Dhanvin Mehta Gonzalo Ferrer Edwin Olson

Published in: ICRA (2018)

Keyphrases

policy evaluation
least squares
neural network
machine learning
multi agent
evolutionary algorithm
cost function
graphical models
monte carlo
decision problems
fixed point
learning tasks
function approximation
temporal difference