Analyzing the Impact of Tokenization on Multilingual Epidemic Surveillance in Low-Resource Languages.
Stephen MutuviEmanuela BorosAntoine DoucetGaël LejeuneAdam JatowtMoses OdeoPublished in: ICDAR (3) (2023)
Keyphrases
- language independent
- cross lingual
- multi lingual
- multilingual information retrieval
- multilingual documents
- language specific
- character n grams
- language resources
- multilingual retrieval
- machine translation
- cross lingual information retrieval
- named entities
- cross language
- digital libraries
- outbreak detection
- public health
- resource management
- information resources
- surveillance system
- expressive power
- cross language information retrieval
- web resources
- resource constraints
- resource allocation
- n gram
- real time
- query translation
- text summarization
- document retrieval
- document collections
- text classification
- dublin city university