Login / Signup

Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?

Tokio KajitsukaIssei Sato
Published in: CoRR (2023)
Keyphrases