Language Reward Modulation for Pretraining Reinforcement Learning.
Ademi AdenijiAmber XieCarmelo SferrazzaYounggyo SeoStephen JamesPieter AbbeelPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- natural language
- programming language
- reinforcement learning algorithms
- temporal difference
- state space
- optimal policy
- markov decision processes
- function approximation
- model free
- reward function
- dynamic programming
- supervised learning
- language learning
- learning agent
- reinforcement learning methods
- learning problems
- eligibility traces
- multi agent
- machine learning
- state action
- partially observable environments
- average reward
- partially observable
- language processing
- data model
- learning algorithm