OneStopEnglish corpus: A new corpus for automatic readability assessment and text simplification.
Sowmya VajjalaIvana LucicPublished in: BEA@NAACL-HLT (2018)
Keyphrases
- broad coverage
- open domain
- supervised machine learning
- text data
- recognizing textual entailment
- natural language text
- text corpora
- manually annotated
- english words
- newspaper articles
- plain text
- text corpus
- spontaneous speech
- linguistic patterns
- document corpus
- topic segmentation
- world knowledge
- conversational speech
- named entity disambiguation
- scientific papers
- document level
- sentence level
- semi automatic
- test set
- natural language processing
- information extraction
- linguistic information
- training corpus
- automatic text