The Extraction of Text/Graphs from Degraded Documents.
Shwu-Huey YenYi-Jin ChenHui-Jen LinChia-Jen WangPublished in: MMM (2004)
Keyphrases
- text documents
- free text
- ocr systems
- information extraction
- information retrieval
- web documents
- text extraction
- digital documents
- optical character recognition
- text mining
- latent semantic analysis
- plagiarism detection
- textual data
- keywords
- document analysis
- text information
- natural language text
- text collections
- document processing
- text data
- text analysis
- text retrieval
- textual information
- text clustering
- textual content
- text content
- textual documents
- page layout
- newspaper articles
- document content
- document collections
- printed documents
- handwritten text
- topic segmentation
- multimedia documents
- journal articles
- automatically extracted
- document images
- text corpus
- linguistic analysis
- electronic documents
- document retrieval
- document set
- key concepts
- document categorization
- document structure
- news stories
- text categorization
- information retrieval systems
- scientific literature
- scientific documents
- automatic categorization
- document level
- natural language processing
- sentence level
- text corpora
- semantic information
- structured documents
- relevant documents
- handwriting recognition
- text classification
- plain text
- retrieval engine
- text classifiers
- text lines
- document clustering
- document representation
- scanned documents
- related documents
- structured data
- topic models
- multi document summarization
- xml documents
- digital libraries
- metadata
- web pages
- search engine