OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation.
Kepan NanRui XiePenghao ZhouTiehan FanZhenheng YangZhijie ChenXiang LiJian YangYing TaiPublished in: CoRR (2024)
Keyphrases
- high quality
- trecvid multimedia event detection
- video collections
- text generation
- event detection
- video dataset
- event recognition
- video segments
- video search
- human actions
- natural language descriptions
- weakly labeled
- text detection
- video sequences
- news video
- video data
- video content
- video clips
- real time
- video streams
- database
- audio content
- natural language generation
- video database
- low quality
- video retrieval
- keywords
- real life
- higher quality
- information retrieval
- digital video
- multimedia documents
- multimedia data
- action recognition
- street view
- space time
- natural scene images
- closed captions
- video analysis
- multimedia search
- text mining
- multimedia
- image quality
- benchmark datasets
- text data
- semantic labels
- web pages
- video frames
- synthetic datasets
- key frames
- real world
- text documents