Login / Signup
Audio-Visual Neural Syntax Acquisition.
Cheng-I Jeff Lai
Freda Shi
Puyuan Peng
Yoon Kim
Kevin Gimpel
Shiyu Chang
Yung-Sung Chuang
Saurabhchand Bhati
David D. Cox
David Harwath
Yang Zhang
Karen Livescu
James R. Glass
Published in:
CoRR (2023)
Keyphrases
</>
audio visual
multi modal
visual information
visual data
video summarization
multimedia
person authentication
temporal context
emotion recognition
multi stream
neural network
multimodal fusion
high level
pattern recognition
natural language
object recognition