iPerceive: Applying Common-Sense Reasoning to Multi-Modal Dense Video Captioning and Video Question Answering.
Aman ChadhaGurneet AroraNavpreet KalotyPublished in: CoRR (2020)
Keyphrases
- multi modal
- question answering
- video search
- semantic concepts
- video data
- multimedia
- video content
- natural language processing
- video analysis
- video sequences
- audio visual
- information extraction
- information retrieval
- multiple modalities
- question classification
- video frames
- video database
- passage retrieval
- video retrieval
- cross language
- qa clef