Structured Video-Language Modeling with Temporal Grouping and Spatial Grounding.
Yuanhao XiongLong ZhaoBoqing GongMing-Hsuan YangFlorian SchroffTing LiuCho-Jui HsiehLiangzhe YuanPublished in: ICLR (2024)
Keyphrases
- language modeling
- spatial and temporal
- temporal correlation
- language model
- spatial temporal
- space time
- temporal resolution
- spatio temporal
- video frames
- temporal redundancy
- information retrieval
- temporal relationships
- retrieval model
- query expansion
- temporal continuity
- temporal information
- n gram
- cross lingual
- probabilistic model
- video sequences
- spatial information
- video data
- text classification
- spatial features
- statistical language models
- improvements in retrieval effectiveness
- multimedia
- relevance model
- mixture model
- pseudo relevance feedback
- retrieval effectiveness
- smoothing methods
- document retrieval
- relevance feedback