Login / Signup
Метод обнаружения дубликатов в потоке текстовых документов (The Method of Detecting Duplicates in a Stream of Text Documents).
Arkady Andreev
Dmitry Berezkin
Ilya Kozlov
Konstantin Simakov
Published in:
RCDL (2014)
Keyphrases
</>
text documents
databases
prior knowledge
text classification
natural language
similarity measure
feature selection
data sets
active learning
text categorization
clustering algorithm
multiscale
knowledge discovery
information extraction
text mining
search engine
model selection
neural network
document clustering