Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog.
Natasha JaquesAsma GhandehariounJudy Hanwen ShenCraig FergusonÀgata LapedrizaNoah JonesShixiang GuRosalind W. PicardPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- function approximation
- learning process
- data sets
- batch mode
- multi attribute
- neural network
- temporal difference
- markov decision processes
- dynamic programming
- multi agent
- state space
- optimal policy
- user interface
- natural language
- human users
- model free
- action selection
- mixed initiative
- decision making
- human machine
- dialog systems
- explicit feedback
- behavioural cloning