CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations.
Vin SachidanandaShao-Yen TsengErik MarchiSachin KajarekarPanayiotis GeorgiouPublished in: CoRR (2022)
Keyphrases
- audio visual
- multimedia
- cross modal
- human language
- language learning
- multi modal
- multimodal fusion
- higher level
- language processing
- natural language
- programming language
- multimodal information
- multi stream
- story segmentation
- text to speech
- audio stream
- semantic representations
- audio signals
- target language
- multiple representations
- specification language
- signal reconstruction
- emotion recognition
- visual data
- neural network