Sign in

Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems.

Jack FitzGeraldShankar AnanthakrishnanKonstantine ArkoudasDavide BernardiAbhishek BhagiaClaudio Delli BoviJin CaoRakesh ChadaAmit ChauhanLuoxin ChenAnurag DwarakanathSatyam DwivediTuran GojayevKarthik GopalakrishnanThomas GueudreDilek Hakkani-TurWael HamzaJonathan J. HüserKevin Martin JoseHaidar KhanBeiye LiuJianhua LuAlessandro ManzottiPradeep NatarajanKarolina OwczarzakGokmen OzEnrico PalumboCharith PerisChandana Satya PrakashStephen RawlsAndy RosenbaumAnjali ShenoySaleh SoltanMukund Harakere SridharLizhen TanFabian TriefenbachPan WeiHaiyang YuShuai ZhengGökhan TürPrem Natarajan
Published in: KDD (2022)
Keyphrases
  • natural language understanding
  • management system
  • probabilistic model
  • computational model
  • multi agent systems
  • prior knowledge