Q-Learning for Continuous Actions with Cross-Entropy Guided Policies.

Riley Simmons-Edler Ben Eisner Eric Mitchell H. Sebastian Seung Daniel D. Lee

Published in: CoRR (2019)

Keyphrases

cross entropy
reward function
optimal policy
action selection
action space
reinforcement learning
state space
log likelihood
state action
maximum likelihood
reinforcement learning algorithms
error function
markov decision processes
language modeling
learning algorithm
multiagent reinforcement learning
learning rate
markov decision process
language model
scoring function