Modeling Long-Term Multimodal Representations for Active Speaker Detection With Spatio-Positional Encoder.

Minyoung Kyoung Hwa Jeon Song

Published in: IEEE Access (2023)

Keyphrases

long term
spatio temporal
audio visual
multi modal
short term
low complexity
automatic detection
detection method
false positives
motion estimation
temporal correlation
anomaly detection
detection accuracy
neural network
speaker verification
noisy environments
temporal filtering
speech recognition
detection algorithm
bit rate
image quality
object detection