Patterns of text reuse in a scientific corpus.
Daniel T. CitronPaul GinspargPublished in: Proc. Natl. Acad. Sci. USA (2015)
Keyphrases
- scientific papers
- linguistic patterns
- supervised machine learning
- broad coverage
- open domain
- newspaper articles
- natural language text
- sentence level
- topic segmentation
- plain text
- scientific literature
- lexical features
- information retrieval
- pattern mining
- text retrieval
- document corpus
- lexico syntactic
- linguistic information
- text documents
- manually annotated
- text mining
- multiword
- extraction patterns
- information extraction
- anaphora resolution
- word pairs
- text corpora
- data mining techniques
- text collections
- text data
- historical documents
- named entity disambiguation
- scientific documents
- database
- training corpus
- textual data
- scientific data
- semantic relations
- co occurrence
- natural language processing
- keywords
- search engine
- artificial intelligence
- data mining