Language Classification and Segmentation of Noisy Documents in Hebrew Scripts.
Alex ZhicharevichNachum DershowitzPublished in: LaTeCH@EACL (2012)
Keyphrases
- document classification
- automatic classification
- pattern recognition
- topic segmentation
- segmentation algorithm
- classify documents
- automatic categorization
- scripting language
- web documents
- document collections
- pre classified
- morphological segmentation
- pixel classification
- classification algorithm
- segmentation method
- information retrieval
- classification accuracy
- decision trees
- feature extraction
- multiscale
- text classification
- supervised learning
- information retrieval systems
- computational linguistics
- feature vectors
- text documents
- document retrieval
- region growing
- support vector machine
- xml documents
- document categorization
- digital libraries
- support vector
- feature set
- medical images