A Taxonomy of Weeds: A Field Guide for Corpus Curators to Winnowing the Parallel Text Harvest.
Katherine YoungJeremy GwinnupLane SchwartzPublished in: AMTA (2) (2016)
Keyphrases
- supervised machine learning
- broad coverage
- text corpus
- text corpora
- text data
- parallel processing
- english words
- open domain
- information retrieval
- lexical features
- information extraction systems
- text retrieval
- world knowledge
- plain text
- named entity disambiguation
- recognizing textual entailment
- syntactic features
- text mining
- web documents
- newspaper articles
- natural language text
- text collections
- text content
- sentence level
- manually annotated
- textual data
- focused crawling
- semantic information
- key concepts
- word pairs
- free text
- information extraction
- anaphora resolution
- topic segmentation
- database
- document corpus