TaL: a synchronised multi-speaker corpus of ultrasound tongue imaging, audio, and lip videos.
Manuel Sam RibeiroJennifer SangerJing-Xuan ZhangAciel EshkyAlan WrenchKorin RichmondSteve RenalsPublished in: CoRR (2020)
Keyphrases
- ultrasound images
- ultrasound imaging
- audio visual
- visual speech
- speaker identification
- automatic transcription
- audio visual speech recognition
- image analysis
- prosodic features
- multimedia
- video sequences
- audio features
- computer aided
- visual data
- visual information
- vocal tract
- image processing
- speech recognition
- hidden markov models
- spontaneous speech
- tissue characterization
- video frames
- imaging systems
- motion analysis
- video signals
- radio frequency
- multi modal
- lecture videos
- audio signals
- audio stream
- motion features
- speech synthesis
- multi stream
- acoustic features
- computer vision
- text to speech
- speaker recognition
- sports video
- emotion recognition
- active contour model
- medical imaging
- image sequences
- ultrasound image sequences