Recurrent multi-head attention fusion network for combining audio and text for speech emotion recognition.
Chung Soo AhnChamara KasunSunil SivadasJagath C. RajapaksePublished in: INTERSPEECH (2022)
Keyphrases
- speech emotion recognition
- text graphics
- recurrent networks
- multimedia
- peer to peer
- real time
- text mining
- information retrieval
- network traffic
- text retrieval
- data fusion
- spiking neural networks
- text to speech
- network model
- multi sensor
- information fusion
- feed forward
- computer networks
- network structure
- combining multiple
- keywords
- late fusion
- database