Login / Signup
Least squares policy iteration with instrumental variables vs. direct policy search: comparison against optimal benchmarks using energy storage.
Somayeh Moazeni
Warren R. Scott
Warren B. Powell
Published in:
INFOR Inf. Syst. Oper. Res. (2020)
Keyphrases
</>
direct policy search
dynamic programming
reinforcement learning
neural network
evolutionary algorithm
temporal difference
optimal solution