Vector Representation of Words for Plagiarism Detection Based on String Matching.
Kensuke BabaTetsuya NakatohToshiro MinamiPublished in: HCI (4) (2017)
Keyphrases
- string matching
- vector representation
- plagiarism detection
- document representation
- pattern matching
- source code
- bag of words
- similarity measure
- edit distance
- regular expressions
- suffix tree
- cross language
- n gram
- text documents
- keywords
- language model
- vector space
- web documents
- document collections
- document clustering
- vector space model
- distance measure
- semantic information
- machine learning
- structured data
- natural language
- web pages