Inter-speaker variability in audio-visual classification of word prominence.
Martin HeckmannPublished in: INTERSPEECH (2013)
Keyphrases
- visual classification
- prosodic features
- automatic transcription
- speaker verification
- audio visual
- visual recognition
- image classification
- scene classification
- face recognition
- speaker identification
- bag of features
- multiple features
- multi modal
- co occurrence
- visual information
- speech recognition
- linear classifiers
- distance metric learning
- speech signal
- n gram
- knn
- object recognition