FastDocode: Finding Approximated Segments of N-Grams for Document Copy Detection - Lab Report for PAN at CLEF 2010.
Gabriel OberreuterGaston L'HuillierSebastián A. RíosJuan D. VelásquezPublished in: CLEF (Notebook Papers/LABs/Workshops) (2010)
Keyphrases
- n gram
- copy detection
- language model
- information retrieval
- query expansion
- web documents
- text classification
- passage retrieval
- bag of words
- language modeling
- relevance ranking
- language independent
- word level
- question answering
- information retrieval systems
- multilingual information retrieval
- document collections
- ad hoc retrieval
- variable length
- document retrieval
- text documents
- part of speech
- retrieval systems
- document ranking
- language modelling
- document representation
- relevant documents
- document analysis
- document images
- test collection
- cross language
- tf idf
- document clustering
- digital libraries
- keywords