Text-Guided Object Detector for Multi-modal Video Question Answering.
Ruoyue ShenNakamasa InoueKoichi ShinodaPublished in: WACV (2023)
Keyphrases
- multi modal
- question answering
- video search
- object detectors
- multiple modalities
- syntactic information
- information retrieval
- object detection
- information extraction
- natural language processing
- natural language
- video data
- video content
- image annotation
- video frames
- audio visual
- video sequences
- object categories
- video retrieval
- text mining
- relation extraction
- text documents
- text retrieval
- object recognition
- bounding box
- key frames
- pairwise
- keywords
- multiscale
- multimedia