On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers.
Tianchu JiShraddhan JainMichael FerdmanPeter A. MilderH. Andrew SchwartzNiranjan BalasubramanianPublished in: CoRR (2021)
Keyphrases
- distribution function
- marginal distributions
- visual attention
- high dimensional
- uniformly distributed
- sparse representation
- random fields
- bayesian networks
- spatial distribution
- probabilistic inference
- data distribution
- parameter values
- standard deviation
- gaussian distribution
- density function
- random variables
- collaborative filtering
- probability distribution
- data sets