AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation.
Jeongsoo ChoiSe Jin ParkMinsu KimYong Man RoPublished in: CoRR (2023)
Keyphrases
- visual speech
- visual speech recognition
- hidden markov models
- audio visual speech recognition
- audio visual
- noisy environments
- speaker identification
- lip reading
- audio signals
- acoustic features
- speech signal
- video signals
- broadcast news
- speaker verification
- audio features
- speech recognition
- multi stream
- multi modal
- text to speech
- non stationary
- feature vectors