Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding.
Kun-Yi HuangChung-Hsien WuQian-Bei HongMing-Hsiang SuYuan-Rong ZengPublished in: ISCSLP (2018)
Keyphrases
- convolutional neural network
- emotion recognition
- prosodic features
- audio visual
- automatic transcription
- spoken documents
- emotional speech
- face detection
- text to speech
- audio stream
- speech synthesis
- speaker verification
- text to speech synthesis
- spontaneous speech
- broadcast news
- emotional state
- multimodal fusion
- facial expressions
- spoken document retrieval
- audio signals
- neural network
- speech recognition
- speech recognizer
- human computer interaction
- english words
- text input
- emotion classification
- vector space
- co occurrence
- automatic speech recognition
- visual information
- multi stream
- english text
- audio video
- audio features
- speech recognition systems
- digital audio
- cepstral features
- speech processing
- recognition errors
- audio recordings
- speaker identification
- word recognition
- multi modal
- acoustic signals
- object detection
- face recognition
- spoken term detection
- image sequences