Sign in

LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers.

Dacheng LiRulin ShaoAnze XieEric P. XingJoseph E. GonzalezIon StoicaXuezhe MaHao Zhang
Published in: CoRR (2023)
Keyphrases
  • level parallelism
  • training set
  • distributed systems
  • artificial intelligence
  • instruction set
  • neural network
  • low cost
  • sufficient conditions
  • computer systems
  • fine grained
  • mobile agents