Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding.
Haoli BaiZhiguang LiuXiaojun MengWentao LiShuang LiuYifeng LuoNian XieRongfu ZhengLiangwei WangLu HouJiansheng WeiXin JiangQun LiuPublished in: ACL (1) (2023)
Keyphrases
- multi modal
- fine grained
- document understanding
- cross modal
- coarse grained
- designing effective
- video search
- access control
- multi modality
- single modality
- high dimensional
- automatic summarization
- automatic text summarization
- visual information
- document clustering
- visual features
- low level
- information retrieval
- multiple modalities