Patterns of Text Reuse in a Scientific Corpus.
Daniel T. CitronPaul GinspargPublished in: CoRR (2014)
Keyphrases
- scientific papers
- linguistic patterns
- broad coverage
- open domain
- supervised machine learning
- text data
- text corpora
- text corpus
- sentence level
- pattern mining
- topic segmentation
- plain text
- english words
- multiword
- text retrieval
- scientific data
- text processing
- natural language text
- document corpus
- scientific literature
- text mining
- linguistic information
- natural language processing
- lexical features
- temporal expressions
- spontaneous speech
- entity extraction
- training corpus
- document level
- manually annotated
- free text
- frequent patterns
- web documents
- data mining techniques
- newspaper articles
- textual features
- world knowledge
- noun phrases
- database
- recognizing textual entailment
- conversational speech
- scientific documents