ADOPD: A Large-Scale Document Page Decomposition Dataset.
Jiuxiang GuXiangxi ShiJason KuenLu QiRuiyi ZhangAnqi LiuAni NenkovaTong SunPublished in: ICLR (2024)
Keyphrases
- keywords
- page layout analysis
- website
- html documents
- web pages
- benchmark datasets
- text documents
- document type
- document images
- document collections
- real world
- information retrieval systems
- document retrieval
- feature set
- www pages
- information retrieval
- web documents
- million images
- web crawler
- document classification
- small scale
- retrieval systems
- real life
- tf idf
- synthetic datasets
- document analysis
- digital libraries
- page layout
- ranked list
- document representation
- decomposition algorithm
- database
- relevant documents
- user queries