Conservative Offline Distributional Reinforcement Learning.
Yecheng Jason MaDinesh JayaramanOsbert BastaniPublished in: NeurIPS (2021)
Keyphrases
- reinforcement learning
- function approximation
- learning algorithm
- co occurrence
- state space
- real time
- reinforcement learning algorithms
- temporal difference
- model free
- dynamic programming
- markov decision processes
- optimal policy
- policy search
- information systems
- artificial intelligence
- machine learning
- real world
- action space
- direct policy search
- evolutionary learning
- continuous state
- stochastic approximation
- temporal difference learning
- multi agent systems
- multi agent
- neural network