Introducing Syllable Tokenization for Low-resource Languages: A Case Study with Swahili.
Jesse AtuhurraHiroyuki ShindoHidetaka KamigaitoTaro WatanabePublished in: CoRR (2024)
Keyphrases
- n gram
- character n grams
- case study
- expressive power
- language independent
- databases
- named entities
- resource management
- neural network
- digital libraries
- biomedical information retrieval
- data sets
- syntactic and semantic dependencies
- multi lingual
- word level
- target language
- high levels
- cross language information retrieval
- web resources
- resource allocation