MMTF: Multi-Modal Temporal Fusion for Commonsense Video Question Answering.
Mobeen AhmadGeonwoo ParkDongchan ParkSanguk ParkPublished in: ICCV (Workshops) (2023)
Keyphrases
- multi modal
- question answering
- multi modality
- semantic concepts
- video search
- passage retrieval
- audio visual
- information extraction
- information retrieval
- question classification
- natural language
- multimedia
- qa clef
- video data
- video sequences
- video content
- single modality
- syntactic information
- answer validation
- cross language
- video streams
- video clips
- multiple modalities
- video analysis
- natural language questions
- semantic roles
- natural language processing
- machine learning
- knowledge base
- answer extraction
- commonsense reasoning
- question answering systems
- video shots
- video frames
- metadata
- key frames
- video retrieval