Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer.
Guangyi ChenXiao LiuGuangrun WangKun ZhangPhilip H. S. TorrXiao-Ping ZhangYansong TangPublished in: CoRR (2023)
Keyphrases
- question answer
- image content
- single image
- multiscale
- input image
- image features
- image collections
- image classification
- image data
- image analysis
- semantic labels
- video frames
- textual descriptions
- video analysis
- low level
- visual data
- image representation
- image segmentation
- video data
- web images
- video streams
- visual features
- segmentation method
- edge detection
- multimedia
- information retrieval
- caption text
- text regions
- temporal continuity
- text information
- image search
- social network analysis
- text mining
- image retrieval
- search engine