Document Alignment for Generation of English-Punjabi Comparable Corpora from Wikipedia.
Vishal GoyalAjit KumarManpreet Singh LehalPublished in: Int. J. E Adopt. (2020)
Keyphrases
- comparable corpora
- wikipedia articles
- parallel corpora
- text documents
- cross language information retrieval
- document collections
- text corpora
- cross language
- machine translation
- wordnet
- news articles
- bilingual lexicon
- language modeling
- document representation
- semantic features
- named entities
- word alignment
- cross lingual
- document retrieval
- link structure
- word pairs
- linguistic resources
- document clustering
- bilingual dictionaries
- information retrieval
- information retrieval systems
- keywords
- language model
- text classifiers
- semantic information
- query translation
- language independent
- semantic relations
- text classification
- document classification
- query terms
- search queries
- text mining
- anchor text
- user generated content
- text collections
- labor intensive
- translation model
- parallel corpus