Keyphrases
- n gram
- clone detection
- linux kernel
- software systems
- software reuse
- metamodel
- string matching
- language model
- text classification
- bag of words
- source code
- variable length
- inside outside algorithm
- software components
- operating system
- software engineering
- document retrieval
- web documents
- analysis tool
- software evolution
- nearest neighbor
- high level
- search engine