Login / Signup
Audio-Visual Neural Syntax Acquisition.
Cheng-I Jeff Lai
Freda Shi
Puyuan Peng
Yoon Kim
Kevin Gimpel
Shiyu Chang
Yung-Sung Chuang
Saurabhchand Bhati
David D. Cox
David Harwath
Yang Zhang
Karen Livescu
James R. Glass
Published in:
ASRU (2023)
Keyphrases
</>
audio visual
multi modal
visual information
neural network
multi stream
video summarization
emotion recognition
visual data
audio visual speech recognition
temporal context
person authentication
multimedia
image classification
three dimensional
contextual information
feature selection
data sets