Reward Gaming in Conditional Text Generation.
Richard Yuanzhe PangVishakh PadmakumarThibault SellamAnkur P. ParikhHe HePublished in: ACL (1) (2023)
Keyphrases
- text generation
- natural language generation
- natural language
- computer games
- reinforcement learning
- theorem prover
- random field model
- educational games
- long run
- virtual environment
- video games
- bandit problems
- game based learning
- reward function
- machine translation
- policy gradient
- game players
- natural language processing