The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models.
Conglong LiMinjia ZhangYuxiong HePublished in: NeurIPS (2022)
Keyphrases
- computational models
- statistical models
- genetic algorithm
- information systems
- probabilistic model
- training set
- structured prediction
- neural network model
- statistical model
- machine learning algorithms
- training examples
- prior knowledge
- database
- computational complexity
- video sequences
- information retrieval
- neural network
- data sets
- real time