Heuristic based script identification from multilingual text documents.
M. Swamy DasD. Sandhya RaniC. R. K. ReddyPublished in: RAIT (2012)
Keyphrases
- text documents
- text mining
- text classification
- text categorization
- comparable corpora
- news articles
- information extraction
- topic models
- cross lingual
- keywords
- wordnet
- named entities
- document clustering
- document classification
- bag of words
- text collections
- text data
- cross language information retrieval
- automatic text categorization
- multiscale
- document collections
- k nearest neighbor
- high dimensional
- data sets