Beyond Document Page Classification: Design, Datasets, and Challenges.
Jordy Van LandeghemSanket BiswasMatthew B. BlaschkoMarie-Francine MoensPublished in: CoRR (2023)
Keyphrases
- document classification
- benchmark datasets
- uci machine learning repository
- classification accuracy
- pattern recognition
- machine learning
- classification method
- website
- case study
- text classification
- real world
- support vector machine svm
- classification algorithm
- document images
- design principles
- training dataset
- uci repository
- support vector machine
- user interface
- information retrieval
- support vector
- web pages
- feature extraction
- document clustering
- page layout analysis
- www pages
- associative classifiers
- database
- decision trees
- information retrieval systems
- digital libraries
- class labels
- document collections
- supervised learning
- text categorization