Login / Signup
The heads hypothesis: A unifying statistical approach towards understanding multi-headed attention in BERT.
Madhura Pande
Aakriti Budhraja
Preksha Nema
Pratyush Kumar
Mitesh M. Khapra
Published in:
CoRR (2021)
Keyphrases
</>
data driven
statistical analysis
statistical information
focus of attention
statistical models
visual attention
data mining
clustering algorithm
data structure
artificial neural networks
evolutionary algorithm
statistical approaches