Login / Signup
yn n-gram-based approach for detecting approximately duplicate database records.
Zengping Tian
Hongjun Lu
Wenyun Ji
Aoying Zhou
Zhong Tian
Published in:
Int. J. Digit. Libr. (2002)
Keyphrases
</>
n gram
database
language model
databases
text classification
variable length
language independent
language modeling
data cleaning
bag of words
part of speech
language modelling
machine learning
data model
finite state transducers