Comparison and Classification of Documents Based on Layout Similarity.
Jianying HuRamanujan S. KashiGordon T. WilfongPublished in: Inf. Retr. (2000)
Keyphrases
- document classification
- feature extraction
- similarity measure
- document collections
- support vector
- classification accuracy
- xml documents
- information retrieval
- pre classified
- document clustering
- text classification
- feature space
- pattern recognition
- page layout
- decision trees
- automatic classification
- cosine similarity
- machine learning
- automatic categorization
- keywords
- vector space model
- training set
- training documents
- similarity scores
- text classifiers
- document image retrieval
- content similarity
- document categorization
- document analysis
- multi document summarization
- relevant documents
- classification algorithm
- text categorization
- support vector machine