Text Separation from Mixed Documents Using a Tree-Structured Classifier.
Xujun PengSrirangaraj SetlurVenu GovindarajuRamachandrula SitaramPublished in: ICPR (2010)
Keyphrases
- text classifiers
- text documents
- free text
- digital documents
- unstructured text
- text classification
- information retrieval
- text categorization
- keywords
- structured data
- web documents
- document categorization
- text analysis
- document analysis
- text data
- text information
- document content
- document classification
- textual content
- training documents
- textual data
- text collections
- document processing
- text retrieval
- multimedia documents
- latent semantic analysis
- plagiarism detection
- automatic categorization
- text content
- handwritten text
- key concepts
- information extraction
- semantic information
- text mining
- newspaper articles
- document clustering
- text corpora
- natural language text
- journal articles
- electronic documents
- textual information
- tree structure
- support vector machine
- text corpus
- handwriting recognition
- document level
- binary decision tree
- classify documents
- document set
- feature selection
- scientific literature
- page layout
- classification algorithm
- digital libraries
- printed documents
- classification trees
- metadata
- training data
- topic segmentation
- decision tree classifiers
- document collections
- related documents
- xml documents
- feature space
- naive bayes
- decision trees
- document representation
- information retrieval systems
- feature set
- natural language processing
- topic models
- retrieval systems
- document retrieval