Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning.
Zhiheng XiWenxiang ChenBoyang HongSenjie JinRui ZhengWei HeYiwen DingShichun LiuXin GuoJunzhe WangHonglin GuoWei ShenXiaoran FanYuhao ZhouShihan DouXiao WangXinbo ZhangPeng SunTao GuiQi ZhangXuanjing HuangPublished in: CoRR (2024)
Keyphrases
- language model
- reinforcement learning
- language modeling
- document retrieval
- n gram
- probabilistic model
- speech recognition
- query expansion
- information retrieval
- statistical language models
- retrieval model
- ad hoc information retrieval
- test collection
- supervised learning
- context sensitive
- vector space model
- smoothing methods
- training set
- language modelling
- language models for information retrieval
- language model for information retrieval
- query specific
- translation model
- pseudo relevance feedback
- query terms
- machine learning
- document ranking
- term dependencies
- information retrieval systems
- recommender systems