Historical document digitization through layout analysis and deep content classification.
Andrea CorbelliLorenzo BaraldiCostantino GranaRita CucchiaraPublished in: ICPR (2016)
Keyphrases
- document classification
- pattern recognition
- web documents
- decision trees
- classification accuracy
- support vector machine svm
- automatic classification
- feature vectors
- multimedia documents
- classification method
- feature space
- support vector
- keywords
- text content
- image classification
- textual content
- retrieval systems
- classification algorithm
- document content
- unsupervised learning
- support vector machine
- training set
- text classification
- machine learning
- information retrieval systems
- document clustering
- xml documents
- training data
- feature selection
- information retrieval