Online fitted policy iteration based on extreme learning machines.
Pablo Escandell-MonteroDelia LorenteJosé María Martínez-MartínezEmilio Soria-OlivasJoan Vila-FrancésJosé David Martín-GuerreroPublished in: Knowl. Based Syst. (2016)
Keyphrases
- policy iteration
- extreme learning machines
- markov decision processes
- model free
- reinforcement learning
- least squares
- sample path
- temporal difference
- optimal policy
- fixed point
- infinite horizon
- neural network
- linear program
- convergence rate
- optimal control
- markov decision process
- optical flow
- average reward
- policy evaluation