CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset.
Tian GanQing WangXingning DongXiangyuan RenLiqiang NieQingpei GuoPublished in: CVPR (2023)
Keyphrases
- trecvid multimedia event detection
- video collections
- chinese text
- event detection
- event recognition
- text summarization
- text detection
- temporal filtering
- natural language descriptions
- video search
- video data
- chinese texts
- video content
- video sequences
- english text
- news video
- human actions
- video analysis
- video frames
- public space
- multimedia documents
- information retrieval
- keyword extraction
- multimedia
- audio content
- chinese web
- video dataset
- video segments
- video retrieval
- text retrieval
- weakly labeled
- million images
- video clips
- online sources
- natural scene images
- real world
- real time
- database
- video database
- multimedia search
- video shots
- spatial filtering
- topic segmentation
- lecture videos
- video streams
- space time
- closed captions