Login / Signup
Policy Gradient Reinforcement Learning with Environmental Dynamics and Action-Values in Policies.
Seiji Ishihara
Harukazu Igarashi
Published in:
KES (1) (2011)
Keyphrases
</>
optimal policy
dynamical systems
dynamic model
fitted q iteration
sufficient conditions
markov decision processes
initial state
revenue management