MultiLeg: Dataset for Text Sanitisation in Less-resourced Languages.
Rinalds ViksnaInguna SkadinaPublished in: LREC/COLING (2024)
Keyphrases
- multi lingual
- text summarization
- english text
- language identification
- expressive power
- language independent
- database
- arabic language
- information retrieval
- manually constructed
- text retrieval
- textual data
- language specific
- text mining
- native language
- cross lingual
- free text
- natural language generation
- grammatical inference
- domain dependent
- text documents
- benchmark datasets
- information retrieval systems
- keywords
- indian languages
- databases