Modeling Long-Term Multimodal Representations for Active Speaker Detection With Spatio-Positional Encoder.
Minyoung KyoungHwa Jeon SongPublished in: IEEE Access (2023)
Keyphrases
- long term
- spatio temporal
- audio visual
- multi modal
- short term
- low complexity
- automatic detection
- detection method
- false positives
- motion estimation
- temporal correlation
- anomaly detection
- detection accuracy
- neural network
- speaker verification
- noisy environments
- temporal filtering
- speech recognition
- detection algorithm
- bit rate
- image quality
- object detection