Facetron: A Multi-Speaker Face-to-Speech Model Based on Cross-Modal Latent Representations.
Seyun UmJihyun KimJihyun LeeHong-Goo KangPublished in: EUSIPCO (2023)
Keyphrases
- cross modal
- multi modal
- audio visual
- speech recognition
- automatic speech recognition
- visual data
- speaker identification
- visual speech
- speaker diarization
- visual recognition
- speech signal
- image retrieval
- face images
- multimedia retrieval
- multimedia databases
- visual similarity
- facial expressions
- information retrieval
- perceptual information
- visual content
- high level