Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature.
Kefan DongJiaqi YangTengyu MaPublished in: NeurIPS (2021)
Keyphrases
- reinforcement learning
- model free
- virtual environment
- function approximation
- reinforcement learning algorithms
- virtual world
- virtual reality
- multiscale
- markov chain
- learning algorithm
- multi armed bandit
- augmented reality
- curvature estimation
- state space
- machine learning
- optimal policy
- optimal control
- virtual laboratory
- learning problems
- random sampling
- learning classifier systems
- temporal difference
- learning process
- artificial neural networks
- e learning
- robotic control