Question-Instructed Visual Descriptions for Zero-Shot Video Answering.
David MogrovejoThamar SolorioPublished in: ACL (Findings) (2024)
Keyphrases
- natural language descriptions
- visual data
- visual cues
- visual information
- video search
- visual analysis
- visual features
- video streams
- high level
- video frames
- video retrieval
- video database
- video sequences
- multimedia
- video content
- video data
- visual saliency
- query answering
- low level
- real time video
- news video
- video indexing and retrieval
- multimedia data
- video shots
- digital video
- visual perception
- semantic concepts
- content based video retrieval
- space time
- video analysis
- key frames
- event recognition
- visual input
- spatial and temporal
- natural language