SERVER: Multi-modal Speech Emotion Recognition using Transformer-based and Vision-based Embeddings.
Nhat Truong PhamDuc Ngoc Minh DangBich Ngoc Hong PhamSy Dzung NguyenPublished in: ICIIT (2023)
Keyphrases
- multi modal
- audio visual
- emotion recognition
- text to speech synthesis
- emotional state
- human computer interaction
- emotional speech
- hands free
- vector space
- speech recognition
- speech signal
- facial expressions
- cross modal
- uni modal
- video search
- multi modality
- euclidean space
- semantic concepts
- text to speech
- fusing multiple