Rethinking Audio-visual Synchronization for Active Speaker Detection.
Abudukelimu WuerkaixiYou ZhangZhiyao DuanChangshui ZhangPublished in: CoRR (2022)
Keyphrases
- audio visual
- multi stream
- multi modal
- visual information
- speaker verification
- visual data
- person authentication
- audio visual speech recognition
- video summarization
- emotion recognition
- temporal context
- multimedia
- semantic information
- contextual information
- noisy environments
- domain knowledge
- hidden markov models
- three dimensional
- computer vision