VLAP: Efficient Video-Language Alignment via Frame Prompting and Distilling for Video Question Answering.
Xijun WangJunbang LiangChun-Kai WangKenan DengYu LouMing C. LinShan YangPublished in: CoRR (2023)
Keyphrases
- question answering
- video frames
- natural language
- video sequences
- video data
- video content
- key frames
- information retrieval
- multimedia
- natural language processing
- question classification
- information extraction
- probabilistic model
- named entities
- expert systems
- semantic relations
- relation extraction
- question answering systems