DocBank: A Benchmark Dataset for Document Layout Analysis.
Minghao LiYiheng XuLei CuiShaohan HuangFuru WeiZhoujun LiMing ZhouPublished in: CoRR (2020)
Keyphrases
- benchmark datasets
- document images
- document collections
- retrieval systems
- pedestrian detection
- information retrieval
- information retrieval systems
- web documents
- document classification
- digital documents
- database
- document retrieval
- data structure
- electronic documents
- relational databases
- keywords
- structured documents
- terms of classification accuracy
- cf loadingtexthtml
- false positives
- document clustering
- text documents
- relevant documents
- text categorization
- digital libraries
- metadata
- data mining