A Document Page Classification Algorithm in Copy Pipeline.
Xiaogang DongPeter MajewiczGordon McNuttCharles A. BoumanJan P. AllebachIlya PollakPublished in: ICIP (3) (2007)
Keyphrases
- classification algorithm
- document classification
- knn
- keywords
- k nearest neighbor
- website
- page layout analysis
- hierarchical classification
- support vector machine
- training phase
- document images
- classification method
- naive bayes
- web pages
- document type
- concept drift
- classification rules
- training set
- learning algorithm
- html documents
- class labels
- text documents
- web documents
- accurate classification
- document collections
- document representation
- information retrieval
- page layout
- nearest neighbor
- databases
- text categorization
- input features
- training data
- click logs
- neural network