Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text Retrieval.
Zerun FengZhimin ZengCaili GuoZheng LiPublished in: IEEE Trans. Circuits Syst. Video Technol. (2023)
Keyphrases
- text retrieval
- temporal information
- space time
- multimedia
- image retrieval
- video sequences
- document collections
- inverted file
- video frames
- information retrieval
- video data
- document retrieval
- query expansion
- structured data
- video content
- retrieval model
- video retrieval
- retrieval quality
- graph structure
- retrieval systems
- key frames
- multi modal
- video clips
- cross language
- relevant documents
- spatio temporal
- spatial information
- video shots
- image sequences
- e learning