Q-Learning for Continuous Actions with Cross-Entropy Guided Policies.
Riley Simmons-EdlerBen EisnerEric MitchellH. Sebastian SeungDaniel D. LeePublished in: CoRR (2019)
Keyphrases
- cross entropy
- reward function
- optimal policy
- action selection
- action space
- reinforcement learning
- state space
- log likelihood
- state action
- maximum likelihood
- reinforcement learning algorithms
- error function
- markov decision processes
- language modeling
- learning algorithm
- multiagent reinforcement learning
- learning rate
- markov decision process
- language model
- scoring function