Segmenting documents by stylistic character.
Neil GrahamGraeme HirstBhaskara MarthiPublished in: Nat. Lang. Eng. (2005)
Keyphrases
- information retrieval
- document collections
- web documents
- authorship attribution
- printed documents
- information retrieval systems
- metadata
- xml documents
- optical character recognition
- vector space model
- document classification
- relevant documents
- text documents
- document analysis
- free text
- document clustering
- legal documents
- keywords
- ocr systems
- scanned documents
- word spotting
- handwritten characters
- textual content
- document representation
- latent semantic analysis
- document retrieval
- retrieval systems
- image segmentation
- clustering algorithm
- expert finding
- plagiarism detection
- database
- document processing
- electronic documents
- document content
- digital documents
- text classification
- digital libraries
- printed text
- printed characters