Login / Signup
Multimodal Transformer Networks with Latent Interaction for Audio-Visual Event Localization.
Yixuan He
Xing Xu
Xin Liu
Weihua Ou
Huimin Lu
Published in:
ICME (2021)
Keyphrases
</>
audio visual
multi modal
visual information
multi stream
event detection
visual data
video summarization
multimedia
temporal context
person authentication
human computer interaction
mobile devices
principal component analysis
multimedia data
multimodal interaction
audio visual speech recognition