SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond.
Marco ComunitàZhi ZhongAkira TakahashiShiqi YangMengjie ZhaoKoichi SaitoYukara IkemiyaTakashi ShibuyaShusuke TakahashiYuki MitsufujiPublished in: CoRR (2024)
Keyphrases
- multimedia
- signal processing
- audio visual
- modeling method
- audio recordings
- cost effective
- computationally expensive
- visual data
- audio stream
- audio video
- audio signals
- cross modal
- visual information
- lightweight
- generative model
- data sets
- computationally efficient
- digital video
- emotion recognition
- multimedia information
- visual features
- broadcast news
- speaker identification
- video recordings
- maximum likelihood
- bayesian networks
- image processing