The In-Sample Softmax for Offline Reinforcement Learning.
Chenjun XiaoHan WangYangchen PanAdam WhiteMartha WhitePublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- temporal difference learning
- state space
- function approximation
- temporal difference
- markov decision processes
- real time
- reinforcement learning algorithms
- model free
- machine learning
- action selection
- optimal policy
- learning process
- sample points
- learning algorithm
- multi agent reinforcement learning
- dynamic programming
- optimal control
- active learning
- multi agent
- randomly selected
- training data
- data samples
- hidden nodes