Silent Speech Recognition with Articulator Positions Estimated from Tongue Ultrasound and Lip Video.
Rachel BeesonKorin RichmondPublished in: INTERSPEECH (2023)
Keyphrases
- audio visual speech recognition
- speech recognition
- ultrasound images
- multi stream
- audio visual
- visual speech
- video data
- hidden markov models
- visual speech recognition
- video sequences
- speaker identification
- multi modal
- video content
- multimedia
- noisy environments
- speech recognition technology
- speech signal
- speech processing
- video retrieval
- isolated word
- language model
- video streams
- automatic speech recognition
- video database
- visual data
- lip reading
- key frames
- audio signal
- vocal tract
- principal component analysis