MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
Xize ChengTao JinRongjie HuangLinjun LiWang LinZehan WangYe WangHuadai LiuAoxiong YinZhou ZhaoPublished in: ICCV (2023)
Keyphrases
- audio visual speech recognition
- audio visual
- multi stream
- visual speech
- person authentication
- hidden markov models
- visual speech recognition
- multi modal
- object recognition
- pattern recognition
- visual information
- acoustic features
- speaker verification
- emotion recognition
- multimedia
- speaker identification
- video signals
- visual data
- multimodal biometrics
- activity recognition
- audio features
- speech signal
- search engine
- human activities
- speech recognition
- spatio temporal
- feature extraction