Login / Signup
Transformers Provably Learn Sparse Token Selection While Fully-Connected Nets Cannot.
Zixuan Wang
Stanley Wei
Daniel Hsu
Jason D. Lee
Published in:
CoRR (2024)
Keyphrases
</>
fully connected
conditional random fields
high dimensional
similarity measure
activation function
scale free