Silent versus modal multi-speaker speech recognition from ultrasound and video.
Manuel Sam RibeiroAciel EshkyKorin RichmondSteve RenalsPublished in: CoRR (2021)
Keyphrases
- speech recognition
- automatic speech recognition
- hidden markov models
- language model
- digital video library
- speaker identification
- speech signal
- speaker dependent
- speech synthesis
- multimedia
- video sequences
- video content
- pattern recognition
- speech processing
- speech recognition technology
- video data
- noisy environments
- speaker adaptation
- speech recognition systems
- video frames
- speaker recognition
- speech recognizer
- speaker diarization
- broadcast news
- cepstral coefficients
- speech retrieval
- bayesian networks
- video retrieval
- key frames