Script Normalization for Unconventional Writing of Under-Resourced Languages in Bilingual Communities.
Sina AhmadiAntonios AnastasopoulosPublished in: ACL (1) (2023)
Keyphrases
- indian languages
- cross lingual
- language identification
- parallel corpora
- query translation
- machine translation
- target language
- comparable corpora
- language independent
- language resources
- document images
- cross lingual information retrieval
- sentence pairs
- bilingual dictionaries
- machine translation system
- social network analysis
- cross language
- preprocessing
- source language
- statistical machine translation
- normalization method
- multiword
- social networks
- multilingual retrieval
- databases
- web communities
- translation model
- word segmentation
- expressive power
- query expansion
- text classification
- word alignment
- linguistic resources
- community detection
- online communities
- news articles
- knowledge sharing
- bilingual lexicon
- machine readable dictionaries