Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities.
Soto MontalvoRaquel Martínez-UnanueArantza CasillasVíctor FresnoPublished in: ACL (2006)
Keyphrases
- document clustering
- named entities
- text mining
- text documents
- cross lingual
- information extraction
- named entity recognition
- question answering
- document collections
- co occurrence
- natural language processing
- document representation
- text analysis
- language independent
- cross language
- clustering method
- news articles
- information retrieval
- text classification
- comparable corpora
- machine learning
- data mining
- document retrieval
- knowledge discovery
- k means
- data analysis
- digital libraries
- similarity measure
- artificial intelligence
- probabilistic model
- knowledge base