Diarizing large corpora using multi-modal speaker linking.
Marc FerrasStefano MasneriOliver SchreerHervé BourlardPublished in: INTERSPEECH (2014)
Keyphrases
- multi modal
- audio visual
- speaker verification
- natural language processing
- multi modality
- cross modal
- speaker diarization
- speech recognition
- image annotation
- semantic concepts
- single modality
- automatic speech recognition
- feature selection
- statistical machine translation
- humanoid robot
- audio features
- speaker identification
- higher level
- high dimensional
- image processing