LAPO: Latent-Variable Advantage-Weighted Policy Optimization for Offline Reinforcement Learning.
Xi ChenAli GhadirzadehTianhe YuJianhao WangAlex Yuan GaoWenzhe LiLiang BinChelsea FinnChongjie ZhangPublished in: NeurIPS (2022)
Keyphrases
- latent variables
- reinforcement learning
- optimal policy
- policy search
- probabilistic model
- action selection
- random variables
- markov decision process
- prior knowledge
- state space
- topic models
- markov decision processes
- hierarchical model
- reinforcement learning algorithms
- reinforcement learning problems
- latent variable models
- function approximation
- function approximators
- real valued
- dynamic programming
- temporal difference
- structured prediction
- machine learning