Teacher Forcing Recovers Reward Functions for Text Generation.
Yongchang HaoYuxin LiuLili MouPublished in: CoRR (2022)
Keyphrases
- text generation
- reward function
- natural language generation
- reinforcement learning
- markov decision processes
- inverse reinforcement learning
- state space
- optimal policy
- natural language
- state variables
- multiple agents
- policy search
- markov decision process
- simple examples
- theorem prover
- transition probabilities
- generative model
- natural language processing
- preference elicitation
- machine translation
- multi agent
- higher order
- machine learning