huPWKP: A Hungarian Text Simplification Corpus.
Noémi PrótárDávid Márk NemeskeyPublished in: RANLP (2023)
Keyphrases
- supervised machine learning
- text to speech
- open domain
- text data
- lexical features
- text corpora
- natural language text
- broad coverage
- text collections
- plain text
- document level
- text corpus
- world knowledge
- newspaper articles
- scientific papers
- text mining
- keywords
- recognizing textual entailment
- english words
- information retrieval
- database
- noun phrases
- text documents
- sentence level
- multiresolution
- free text
- topic segmentation
- textual features
- multiword
- anaphora resolution
- named entity disambiguation
- word pairs
- linguistic patterns
- document corpus
- information extraction systems
- spontaneous speech
- temporal expressions
- manually annotated
- linguistic information
- training corpus
- scientific literature
- writing style
- language model
- semantic information
- text retrieval
- entity extraction
- textual data
- text processing