Alexa Teacher Model: Pretraining and Distilling Multi-Billion-Parameter Encoders for Natural Language Understanding Systems.
Jack FitzGeraldShankar AnanthakrishnanKonstantine ArkoudasDavide BernardiAbhishek BhagiaClaudio Delli BoviJin CaoRakesh ChadaAmit ChauhanLuoxin ChenAnurag DwarakanathSatyam DwivediTuran GojayevKarthik GopalakrishnanThomas GueudreDilek Hakkani-TurWael HamzaJonathan J. HüserKevin Martin JoseHaidar KhanBeiye LiuJianhua LuAlessandro ManzottiPradeep NatarajanKarolina OwczarzakGokmen OzEnrico PalumboCharith PerisChandana Satya PrakashStephen RawlsAndy RosenbaumAnjali ShenoySaleh SoltanMukund Harakere SridharLizhen TanFabian TriefenbachPan WeiHaiyang YuShuai ZhengGökhan TürPrem NatarajanPublished in: KDD (2022)