Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners.
Sarthak YadavSergios TheodoridisLars Kai HansenZheng-Hua TanPublished in: ICLR (2024)
Keyphrases
- denoising
- learning process
- learning environment
- signal processing
- e learning
- learning activities
- learning community
- multimedia
- audio visual
- visual attention
- learning resources
- concept maps
- visual data
- visual information
- language learning
- learning experience
- sliding window
- collaborative learning
- window size
- focus of attention
- audio video