Login / Signup

LSG Attention: Extrapolation of pretrained Transformers to long sequences.

Charles CondevauxSébastien Harispe
Published in: CoRR (2022)
Keyphrases