Off-policy Multi-step Q-learning.
Gabriel KalweitMaria HügleJoschka BoedeckerPublished in: CoRR (2019)
Keyphrases
- multi step
- td learning
- reinforcement learning
- function approximation
- cooperative
- state space
- multi agent
- stochastic approximation
- lower bounding
- optimal policy
- learning algorithm
- learning rate
- reinforcement learning algorithms
- action selection
- single step
- knn
- model free
- semi supervised
- tumor classification
- objective function
- policy iteration
- supervised learning
- training set