Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning.
Tianbao XieSiheng ZhaoChen Henry WuYitao LiuQian LuoVictor ZhongYanchao YangTao YuPublished in: CoRR (2023)
Keyphrases
- reward function
- reinforcement learning
- reinforcement learning algorithms
- markov decision processes
- state space
- multiple agents
- optimal policy
- partially observable
- policy search
- inverse reinforcement learning
- transition probabilities
- markov decision process
- function approximation
- state variables
- state action
- average reward
- hierarchical reinforcement learning
- dynamic programming
- keywords
- action selection
- reward shaping
- learning agent
- initially unknown
- data mining
- temporal difference
- model free
- markov chain
- learning algorithm
- machine learning
- dynamic systems
- decision problems
- action space
- web documents
- text mining
- search space
- multi agent