Challenges of Diacritical Marker or Hudhaa Character in Tokenization of Oromo Text.
Abraham Tesso NedjoDegen HuangXiaoxia LiuPublished in: J. Softw. (2014)
Keyphrases
- arabic text
- lessons learned
- text retrieval
- keywords
- scene text
- information retrieval
- text recognition
- database
- printed text
- text input
- character n grams
- biomedical text
- real world
- named entities
- variable length
- textual data
- text summarization
- text detection
- printed documents
- key concepts
- text extraction
- free text
- scene images
- semantic network
- text documents
- text mining
- natural language processing