Cross lingual text classification by mining multilingual topics from wikipedia.
Xiaochuan NiJian-Tao SunJian HuZheng ChenPublished in: WSDM (2011)
Keyphrases
- cross lingual
- text classification
- text mining
- web news
- monolingual and cross lingual
- text documents
- text data
- cross lingual information retrieval
- topic modeling
- cross language
- language modeling
- language independent
- bag of words
- text categorization
- multi lingual
- named entities
- topic models
- knn
- labeled data
- n gram
- parallel corpus
- language specific
- feature selection
- information retrieval
- probabilistic topic models
- data mining
- translation model
- natural language processing
- knowledge discovery
- machine learning
- machine translation system
- wikipedia articles
- text classifiers
- semantic features
- transfer learning
- document clustering
- text corpora
- machine translation
- document collections
- wordnet
- language model
- keywords
- latent dirichlet allocation
- retrieval model
- information retrieval systems
- natural language