Investigation of feature selection for historical document layout analysis.
Hao WeiKai ChenAnguelos NicolaouMarcus LiwickiRolf IngoldPublished in: IPTA (2014)
Keyphrases
- feature selection
- document images
- information retrieval
- document clustering
- text categorization
- information retrieval systems
- retrieval systems
- multi class
- mutual information
- dimensionality reduction
- machine learning
- document classification
- text classifiers
- text documents
- keywords
- document collections
- method for feature selection
- information gain
- feature selection algorithms
- structured documents
- unsupervised feature selection
- web documents
- model selection
- text classification
- feature extraction
- feature set
- classification accuracy
- vector space model
- document analysis
- selected features
- support vector
- database
- user queries
- document frequency
- digital forensics
- feature space
- term frequency
- image retrieval
- document representation
- tf idf
- historical data