arTenTen: Arabic Corpus and Word Sketches.
Tressy ArtsYonatan BelinkovNizar HabashAdam KilgarriffVit SuchomelPublished in: J. King Saud Univ. Comput. Inf. Sci. (2014)
Keyphrases
- unknown words
- word frequencies
- morphological analysis
- text corpus
- english words
- word pairs
- multiword
- word sense
- sentence level
- handwritten words
- lexical features
- training corpus
- linguistic information
- statistical machine translation
- natural language text
- word segmentation
- printed text
- pos taggers
- co occurrence
- handwritten documents
- noun phrases
- pos tagging
- spontaneous speech
- word co occurrence
- related words
- word recognition
- language processing
- part of speech
- word sense disambiguation
- n gram
- sentence pairs
- arabic documents
- word frequency
- machine translation system
- document level
- probabilistic context free grammars
- text corpora
- text mining
- natural language processing
- information retrieval