MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
Xize ChengLinjun LiTao JinRongjie HuangWang LinZehan WangHuangdai LiuYe WangAoxiong YinZhou ZhaoPublished in: CoRR (2023)
Keyphrases
- audio visual
- audio visual speech recognition
- multi stream
- person authentication
- visual speech
- hidden markov models
- visual speech recognition
- multi modal
- noisy environments
- visual information
- object recognition
- speaker identification
- emotion recognition
- pattern recognition
- biometric identification
- multimodal biometrics
- multimedia
- visual data
- feature extraction
- speaker verification
- audio signals
- feature selection
- audio features
- gait recognition
- noise reduction
- human activities
- activity recognition
- speech recognition
- image classification