Login / Signup
Facetron: Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations.
Se-Yun Um
Jihyun Kim
Jihyun Lee
Sangshin Oh
Kyungguen Byun
Hong-Goo Kang
Published in:
CoRR (2021)
Keyphrases
</>
cross modal
multi modal
audio visual
speech recognition
automatic speech recognition
visual data
speaker identification
visual speech
multimedia retrieval
speaker diarization
image retrieval
perceptual information
language model
visual similarity
visual recognition
image annotation
generative model
higher level