Multimodal Speech Emotion Recognition Using Audio and Text.
Seunghyun YoonSeokhyun ByunKyomin JungPublished in: CoRR (2018)
Keyphrases
- text to speech synthesis
- audio visual
- emotion recognition
- text to speech
- multimodal fusion
- multimodal interfaces
- multimodal interaction
- multi modal
- text input
- spoken documents
- affect detection
- prosodic features
- multi stream
- text graphics
- emotional speech
- speech synthesis
- broadcast news
- speaker verification
- multimedia
- story segmentation
- visual information
- visual data
- text recognition
- audio signals
- audio stream
- facial expressions
- text mining
- audio features
- human computer interaction
- visual speech
- high robustness
- keywords
- english text
- multi lingual
- information retrieval
- audio video
- emotional state
- audio recordings
- emotion classification
- text data
- human language
- multiple modalities
- speaker identification
- news video
- cepstral features
- audio content
- speech processing
- music retrieval
- cross modal
- affective states
- speech signal
- video sequences