Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards.
Heriberto CuayáhuitlDonghyeon LeeSeonghan RyuSungja ChoiInchul HwangJihie KimPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- behavioural cloning
- reward function
- perceptual aliasing
- action selection
- reward signal
- partially observable
- function approximation
- optimal policy
- markov decision processes
- multi agent
- human operators
- state space
- human activities
- machine learning
- partially observable domains
- state and action spaces
- sensory inputs
- reinforcement learning algorithms
- model free
- partial observability
- temporal difference
- state action
- action space
- transfer learning
- learned knowledge
- human actions
- human beings
- control policy
- learning algorithm
- reward shaping
- optimal control
- human behavior
- learning agent
- human subjects
- markov decision problems
- neural network
- plan recognition
- decision theoretic