Teacher Forcing Recovers Reward Functions for Text Generation.
Yongchang HaoYuxin LiuLili MouPublished in: NeurIPS (2022)
Keyphrases
- text generation
- reward function
- natural language generation
- markov decision processes
- natural language
- state space
- reinforcement learning
- multiple agents
- inverse reinforcement learning
- optimal policy
- state variables
- theorem prover
- transition probabilities
- simple examples
- markov decision process
- policy search
- dynamic programming
- search space