DocBank: A Benchmark Dataset for Document Layout Analysis.
Minghao LiYiheng XuLei CuiShaohan HuangFuru WeiZhoujun LiMing ZhouPublished in: COLING (2020)
Keyphrases
- benchmark datasets
- document collections
- text documents
- retrieval systems
- information retrieval
- web documents
- document clustering
- data sets
- keywords
- terms of classification accuracy
- information retrieval systems
- document images
- digital documents
- document analysis
- vector space model
- document classification
- document retrieval
- object detection
- document processing
- neural network