The Lowest Form of Flattery: Characterising Text Re-Use and Plagiarism Patterns in a Digital Library Corpus.
George BuchananDana McKayPublished in: JCDL (2017)
Keyphrases
- digital libraries
- plagiarism detection
- digital documents
- linguistic patterns
- supervised machine learning
- text data
- open domain
- text corpus
- information retrieval
- text mining
- data mining techniques
- plain text
- anaphora resolution
- text retrieval
- newspaper articles
- english words
- broad coverage
- free text
- scientific papers
- document corpus
- named entity disambiguation
- database
- machine readable form
- topic segmentation
- world knowledge
- training corpus
- databases
- text collections
- linguistic information
- sentence level
- manually annotated
- text corpora
- structural features
- word pairs
- named entities
- document collections
- information extraction systems
- lexical features
- source code
- information extraction
- recognizing textual entailment
- lexico syntactic
- metadata