Is Someone Speaking?: Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection.
Ruijie TaoZexu PanRohan Kumar DasXinyuan QianMike Zheng ShouHaizhou LiPublished in: ACM Multimedia (2021)
Keyphrases
- audio visual
- long term
- multi modal
- visual information
- speaker verification
- multimedia
- emotion recognition
- visual data
- person authentication
- multi stream
- temporal context
- metadata
- audio visual speech recognition
- video summarization
- data management
- multimedia data
- high dimensional
- pattern recognition
- three dimensional
- computer vision