Login / Signup
A System for Identifying and Exploring Text Repetition in Large Historical Document Corpora.
Aleksi Vesanto
Filip Ginter
Hannu Salmi
Asko Nivala
Tapio Salakoski
Published in:
NODALIDA (2017)
Keyphrases
</>
document corpus
text documents
text collections
text corpus
historical documents
text corpora
text data
web documents
information retrieval
digital documents
text content
topic segmentation
keywords
document processing
word frequency
document analysis
text clustering
textual content
document collections
document content
document clustering
text mining
scientific papers
scientific documents
text classifiers
multimedia documents
document images
semantic information
related documents
printed documents
historical manuscripts
natural language processing
text summarization
document representation
document retrieval
text classification
training corpus
textual documents
document structure
text analysis
textual data
information retrieval systems
retrieval engine
free text
electronic documents
automatic text summarization
text categorization
news articles
linguistic patterns
search engine
structured documents
topic models
automatic summarization
latent semantic analysis
statistical machine translation