Login / Signup
SPEAKER VGG CCT: Cross-corpus Speech Emotion Recognition with Speaker Embedding and Vision Transformers.
Alessandro Arezzo
Stefano Berretti
Published in:
CoRR (2022)
Keyphrases
</>
speech emotion recognition
speaker verification
audio visual
speaker recognition
computer vision
speaker diarization
speech recognition
vision system
automatic speech recognition
open domain
speaker dependent
data sets
digital images
multi modal
prosodic features