Learning to Maximize Mutual Information for Chain-of-Thought Distillation.
Xin ChenHanxian HuangYanjun GaoYi WangJishen ZhaoKe DingPublished in: CoRR (2024)
Keyphrases
- mutual information
- learning algorithm
- pattern recognition
- prior knowledge
- elementary school
- learning scheme
- learning problems
- learning tasks
- information theoretic
- learning systems
- online learning
- data mining
- neural network
- text classification
- supervised learning
- empirical studies
- learning process
- background knowledge
- machine learning
- real time
- database