N-Grams and Morphological Normalization in Text Classification: A Comparison on a Croatian-English Parallel Corpus.
Artur SilicJean-Hugues ChauchatBojana Dalbelo BasicAnnie MorinPublished in: EPIA Workshops (2007)
Keyphrases
- n gram
- parallel corpus
- text classification
- language independent
- cross lingual
- bag of words
- language modeling
- feature selection
- cross language
- text categorization
- text mining
- document classification
- language specific
- machine learning
- language model
- word alignment
- sentiment analysis
- cross language information retrieval
- statistical machine translation
- text documents
- part of speech
- machine translation
- character n grams
- labeled data
- word level
- parallel corpora
- knn
- target language
- query translation
- machine translation system
- source language
- natural language
- web documents