OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.
Qingyun LiZhe ChenWeiyun WangWenhai WangShenglong YeZhenjiang JinGuanzhou ChenYinan HeZhangwei GaoErfei CuiJiashuo YuHao TianJiasheng ZhouChao XuBin WangXingjian WeiWei LiWenjian ZhangBo ZhangPinlong CaiLicheng WenXiangchao YanZhenxiang LiPei ChuYi WangMin DouChangyao TianXizhou ZhuLewei LuYushi ChenJunjun HeZhongying TuTong LuYali WangLimin WangDahua LinYu QiaoBotian ShiConghui HeJifeng DaiPublished in: CoRR (2024)
Keyphrases
- image data
- input image
- ground truth
- image database
- image analysis
- image annotation
- broad coverage
- image features
- text information
- image collections
- edge detection
- image retrieval
- multi modal
- three dimensional
- image classification
- pixel level
- web images
- image registration
- supervised machine learning
- open domain
- image quality
- multiple modalities
- text classification
- image set
- text mining
- image search
- text detection