Let's reward step by step: Step-Level reward model as the Navigators for Reasoning.
Qianli MaHaotian ZhouTingkai LiuJianbo YuanPengfei LiuYang YouHongxia YangPublished in: CoRR (2023)
Keyphrases
- formal model
- statistical model
- computational model
- neural network
- simulation model
- post processing
- management system
- probabilistic model
- objective function
- reinforcement learning
- theoretical analysis
- similarity measure
- high level
- theoretical framework
- mathematical model
- knowledge base
- experimental data
- network structure
- machine learning
- data sets