Transcending Scaling Laws with 0.1% Extra Compute.
Yi TayJason WeiHyung Won ChungVinh Q. TranDavid R. SoSiamak ShakeriXavier GarciaHuaixiu Steven ZhengJinfeng RaoAakanksha ChowdheryDenny ZhouDonald MetzlerSlav PetrovNeil HoulsbyQuoc V. LeMostafa DehghaniPublished in: CoRR (2022)