Frame-Level Phoneme-Invariant Speaker Embedding for Text-Independent Speaker Recognition on Extremely Short Utterances.
Naohiro TawaraAtsunori OgawaTomoharu IwataMarc DelcroixTetsuji OgawaPublished in: ICASSP (2020)
Keyphrases
- speech sounds
- speech recognition
- speaker dependent
- automatic speech recognition systems
- automatic speech recognition
- recognition rate
- phoneme recognition
- recognition accuracy
- recognition algorithm
- invariant moments
- object recognition
- automatic recognition
- vocal tract
- prosodic features
- video frames
- higher level
- pattern recognition
- speaker independent
- speech signal
- speech synthesis
- speaker identification
- feature extraction
- facial action units
- audio visual
- action recognition
- hidden markov models
- video sequences