A fast algorithm to binarize and filter documents with back-to-front interference.
João Marcelo Monte da SilvaRafael Dueire LinsPublished in: SAC (2007)
Keyphrases
- information retrieval
- document retrieval
- information retrieval systems
- document collections
- web documents
- relevant documents
- text documents
- xml documents
- metadata
- document clustering
- legal documents
- vector space
- preprocessing step
- document representation
- latent semantic analysis
- semantic information
- text categorization
- noise reduction
- free text
- document classification
- multipath
- noise removal
- locality sensitive
- noise ratio