DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning.
Hussein HazimehZhe ZhaoAakanksha ChowdheryMaheswaran SathiamoorthyYihua ChenRahul MazumderLichan HongEd H. ChiPublished in: CoRR (2021)
Keyphrases