A German Corpus for Text Similarity Detection Tasks.
Juan-Manuel Torres-MorenoGerardo SierraPeter PeinlPublished in: CoRR (2017)
Keyphrases
- supervised machine learning
- open domain
- similarity measure
- broad coverage
- false positives
- world knowledge
- detection algorithm
- training corpus
- text data
- lexical features
- detection method
- text retrieval
- scientific papers
- natural language text
- text corpora
- information retrieval
- object detection
- plain text
- english words
- sentence similarity
- anaphora resolution
- contextual features
- topic segmentation
- document level
- sentence level
- manually annotated
- complex background
- text collections
- semantic information
- text representation
- word pairs
- newspaper articles
- edit distance
- distance measure
- entity extraction
- co occurrence
- recognizing textual entailment
- text mining