Transcending Scaling Laws with 0.1% Extra Compute.
Yi TayJason WeiHyung Won ChungVinh Q. TranDavid R. SoSiamak ShakeriXavier GarciaHuaixiu Steven ZhengJinfeng RaoAakanksha ChowdheryDenny ZhouDonald MetzlerSlav PetrovNeil HoulsbyQuoc LeMostafa DehghaniPublished in: EMNLP (2023)