Avoiding Wireheading with Value Reinforcement Learning.

Tom Everitt Marcus Hutter

Published in: CoRR (2016)

Keyphrases

reinforcement learning
function approximation
state space
robotic control
model free
temporal difference
markov decision processes
learning algorithm
optimal policy
reinforcement learning algorithms
evolutionary learning
function approximators
artificial intelligence
stochastic approximation
temporal difference learning
action space
direct policy search
learning problems
optimal control
planning problems
database
transfer learning
learning process
learning environment
case study
computer vision
neural network