Beyond the Document: Transcribing the Text of the Document and the Variant States of the Text.
Barbara BordalejoPublished in: DH (2013)
Keyphrases
- text documents
- digital documents
- information retrieval
- document processing
- keywords
- web documents
- textual content
- document analysis
- text content
- document content
- text collections
- multimedia documents
- text clustering
- technical papers
- page layout analysis
- text mining
- document collections
- document categorization
- scientific documents
- scientific papers
- information retrieval systems
- document structure
- text retrieval
- document classification
- document set
- printed documents
- document images
- document corpus
- electronic documents
- text corpus
- database
- latent semantic analysis
- semantic information
- textual documents
- extractive summarization
- text summarization
- automatic text summarization
- document clustering
- noun phrases
- text representation
- scanned documents
- web pages
- pdf files
- temporal expressions
- retrieval engine
- text classification
- text categorization
- retrieval systems
- document representation
- text lines
- structured documents
- character recognition
- free text
- authorship attribution
- handwritten text
- test collection
- document type
- printed text
- tf idf