Policy Shaping: Integrating Human Feedback with Reinforcement Learning.
Shane GriffithKaushik SubramanianJonathan ScholzCharles L. Isbell Jr.Andrea Lockerd ThomazPublished in: NIPS (2013)
Keyphrases
- reinforcement learning
- optimal policy
- reward shaping
- policy search
- action selection
- approximate dynamic programming
- markov decision process
- policy iteration
- state space
- state and action spaces
- reinforcement learning algorithms
- reward signal
- human behavior
- human operators
- creative problem solving
- control policy
- partially observable environments
- temporal difference
- markov decision problems
- policy gradient
- state action
- control policies
- actor critic
- reinforcement learning problems
- action space
- user engagement
- human subjects
- decision problems
- relevance feedback
- partially observable markov decision processes
- agent learns
- policy evaluation
- neural network
- partially observable
- infinite horizon
- finite state
- optimal control
- markov decision processes
- supervised learning
- dynamic programming
- learning algorithm