Text string extraction within mixed-mode documents.
Frank HönesJürgen LichterPublished in: ICDAR (1993)
Keyphrases
- mixed mode
- text documents
- free text
- information retrieval
- digital documents
- information extraction
- web documents
- string matching
- document analysis
- keywords
- textual content
- document content
- text retrieval
- plagiarism detection
- document categorization
- multimedia documents
- text information
- text content
- text collections
- natural language text
- latent semantic analysis
- semantic information
- document collections
- text mining
- pattern matching
- electronic documents
- text classification
- database
- information retrieval systems
- xml documents
- document clustering
- text categorization
- text lines
- edit distance
- code generation
- relevant documents
- related documents
- extraction rules
- natural language processing
- case study
- handwritten documents
- wordnet
- data driven
- source code