Language morphology offset: Text classification on a Croatian-English parallel corpus.
M. MalenicaT. SmucJan SnajderBojana Dalbelo BasicPublished in: Inf. Process. Manag. (2008)
Keyphrases
- parallel corpus
- text classification
- cross lingual
- language independent
- word forms
- machine translation system
- query translation
- text categorization
- bag of words
- machine translation
- cross language
- feature selection
- word alignment
- language modeling
- cross language information retrieval
- sentence pairs
- labeled data
- n gram
- statistical machine translation
- document classification
- machine learning
- source language
- text classifiers
- target language
- sentiment analysis
- text mining
- sentiment classification
- word level
- knn
- indian languages
- semantic features
- co occurrence
- information extraction
- bilingual dictionaries
- probabilistic model
- retrieval systems
- data mining