Login / Signup

Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot.

Zixuan WangStanley WeiDaniel HsuJason D. Lee
Published in: CoRR (2024)
Keyphrases
  • fully connected
  • conditional random fields
  • high dimensional
  • similarity measure
  • activation function
  • scale free