Code as Reward: Empowering Reinforcement Learning with VLMs.
David VenutoSami Nur IslamMartin KlissarovDoina PrecupSherry YangAnkit AnandPublished in: CoRR (2024)
Keyphrases
- reinforcement learning
- function approximation
- source code
- state space
- reward function
- learning algorithm
- reinforcement learning algorithms
- eligibility traces
- supervised learning
- multi agent
- markov decision processes
- model free
- learning process
- optimal policy
- machine learning
- learning agent
- total reward
- reinforcement learning methods
- function approximators
- learning capabilities
- partially observable environments
- temporal difference
- optimal control
- learning problems
- transfer learning
- partially observable
- action selection
- temporal difference learning
- markov decision problems
- policy gradient
- transition model
- open source
- robotic control
- dynamic programming