Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
Tokio KajitsukaIssei SatoPublished in: ICLR (2024)
Keyphrases
- low rank
- weight matrices
- missing data
- convex optimization
- linear combination
- matrix completion
- matrix factorization
- rank minimization
- low rank matrix
- high dimensional data
- semi supervised
- kernel matrix
- singular value decomposition
- weight matrix
- high order
- missing values
- trace norm
- data sets
- low dimensional
- dimensionality reduction
- feature extraction
- machine learning