Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding.
Haoli BaiZhiguang LiuXiaojun MengWentao LiShuang LiuNian XieRongfu ZhengLiangwei WangLu HouJiansheng WeiXin JiangQun LiuPublished in: CoRR (2022)
Keyphrases
- multi modal
- fine grained
- document understanding
- cross modal
- coarse grained
- video search
- designing effective
- multi modality
- automatic text summarization
- document clustering
- access control
- single modality
- visual information
- automatic summarization
- high dimensional
- multi document summarization
- visual features
- image annotation
- high level
- language independent
- image search