Multimodal Speech Emotion Recognition Using Cross Attention with Aligned Audio and Text.
Yoonhyung LeeSeunghyun YoonKyomin JungPublished in: INTERSPEECH (2020)
Keyphrases
- audio visual
- text to speech synthesis
- emotion recognition
- text to speech
- multimodal fusion
- multimodal interfaces
- multimodal interaction
- multi modal
- multi stream
- emotional speech
- spoken documents
- multimedia
- text graphics
- human language
- speech synthesis
- prosodic features
- visual information
- broadcast news
- speaker verification
- audio stream
- story segmentation
- human computer interaction
- text input
- cepstral features
- english text
- audio features
- text mining
- emotional state
- audio signals
- multi lingual
- content based video retrieval
- keywords
- digital audio
- facial expressions
- affect detection
- information retrieval
- high robustness
- affect sensing
- text data
- affective states
- audio content
- video search
- cross modal
- lexical features
- spontaneous speech
- emotion classification
- visual data