End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features.
Chiori HoriHuda AlAmriJue WangGordon WichernTakaaki HoriAnoop CherianTim K. MarksVincent CartillierRaphael Gontijo LopesAbhishek DasIrfan EssaDhruv BatraDevi ParikhPublished in: ICASSP (2019)
Keyphrases
- audio visual
- end to end
- visual data
- audio features
- multimodal fusion
- video scene
- person authentication
- multi modal
- video summarization
- multimedia
- visual information
- video sequences
- multi stream
- scalable video
- text localization and recognition
- audio visual content
- video data
- low level
- feature extraction
- multimedia data
- feature space
- audio visual speech recognition
- feature vectors
- video streaming
- compressed video
- image data
- spatio temporal
- image sequences
- feature selection