Efficient Exploration for Dialog Policy Learning with Deep BBQ Networks \& Replay Buffer Spiking.

Published in: CoRR (2016)

Keyphrases