Audio-Visual Speech Representation Expert for Enhanced Talking Face Video Generation and Evaluation.
Dogucan YamanFevziye Irem EyiokurLeonard BärmannSeymanur AktiHazim Kemal EkenelAlexander WaibelPublished in: CoRR (2024)
Keyphrases
- audio visual
- video summarization
- person authentication
- visual data
- multimodal fusion
- multi modal
- multimedia
- audio features
- audio visual content
- emotion recognition
- visual information
- temporal context
- multi stream
- video data
- speaker verification
- video frames
- video content
- audio visual speech recognition
- video sequences
- video streams
- space time
- contextual information
- human computer interaction
- visual features
- image database
- natural language processing
- nearest neighbor
- domain knowledge