MAST: Multiscale Audio Spectrogram Transformers.
Sreyan GhoshAshish SethSrinivasan UmeshDinesh ManochaPublished in: ICASSP (2023)
Keyphrases
- multiscale
- pattern analysis
- multimedia
- audio visual
- natural images
- coarse to fine
- signal processing
- visual information
- image representation
- scale space
- speech signal
- digital video
- visual data
- speaker identification
- wavelet transform
- image processing
- filter bank
- multiscale analysis
- wigner distribution
- image segmentation
- audio signals
- edge detection
- image fusion
- computer vision
- cross modal
- cepstral features
- emotion recognition
- wavelet decomposition
- soccer video
- shape representation
- single channel
- partial differential equations
- multi modal
- energy distribution
- multiscale representation
- short time fourier transform
- low level
- audio stream
- feature space