Cleaning the Europarl Corpus for Linguistic Applications.
Johannes GraënDolores BatinicMartin VolkPublished in: KONVENS (2014)
Keyphrases
- linguistic features
- linguistic information
- natural language text
- linguistic patterns
- reference resolution
- hand crafted
- natural language
- linguistic knowledge
- open domain
- manually annotated
- sentence level
- annotated corpus
- supervised machine learning
- genetic algorithm
- text corpora
- data cleaning
- text classification
- language model
- artificial intelligence