Leveraging Visual Supervision for Array-Based Active Speaker Detection and Localization.
Davide BerghiPhilip J. B. JacksonPublished in: IEEE ACM Trans. Audio Speech Lang. Process. (2024)
Keyphrases
- automatic detection
- detection accuracy
- visual information
- visual cues
- false alarms
- visual features
- detection algorithm
- detection rate
- activity detection
- visual perception
- low level
- anomaly detection
- high level
- accurate localization
- speech recognition
- object detection
- face detection
- detection method
- multi modal
- audio visual
- image features
- object recognition
- video sequences
- noisy environments
- image sequences
- programmable logic
- reliable detection
- neural network
- generalized hough transform