Avoiding Wireheading with Value Reinforcement Learning.
Tom EverittMarcus HutterPublished in: CoRR (2016)
Keyphrases
- reinforcement learning
- function approximation
- state space
- robotic control
- model free
- temporal difference
- markov decision processes
- learning algorithm
- optimal policy
- reinforcement learning algorithms
- evolutionary learning
- function approximators
- artificial intelligence
- stochastic approximation
- temporal difference learning
- action space
- direct policy search
- learning problems
- optimal control
- planning problems
- database
- transfer learning
- learning process
- learning environment
- case study
- computer vision
- neural network