Discriminating between Similar Languages Using a Combination of Typed and Untyped Character N-grams and Words.
Helena Gómez-AdornoIlia MarkovJorge BaptistaGrigori SidorovDavid PintoPublished in: VarDial (2017)
Keyphrases
- character n grams
- n gram
- variable length
- language specific
- cross language
- language independent
- cross language information retrieval
- arabic documents
- language model
- optical character recognition
- cross lingual
- language modeling
- text classification
- machine learning
- out of vocabulary
- information extraction
- information seeking
- text retrieval
- machine translation
- bag of words
- question answering
- web documents
- query translation
- text categorization
- labor intensive
- word spotting
- keywords