Harvest Uyghur-Chinese Aligned-Sentences Bitexts from Multilingual Sites Based on Word Embedding.
ShaoLin ZhuXiao LiYating YangLei WangChenggang MiPublished in: CCL (2017)
Keyphrases
- word segmentation
- chinese text
- sentence level
- machine translation system
- text summarization
- chinese word segmentation
- language independent
- chinese english
- cross lingual
- word alignment
- text corpus
- unknown words
- language specific
- natural language
- text generation
- parallel corpus
- syntactic analysis
- word meanings
- word sense
- statistical machine translation
- n gram
- sentence similarity
- lexical features
- co occurrence
- syntactic information
- noun phrases
- tree bank
- cross language information retrieval
- machine translation
- english text
- word frequency
- syntactic categories
- source language
- word level
- digital libraries
- cross language
- wordnet
- question answering
- text classification
- automatic summarization
- website
- word sense disambiguation
- vector space
- natural language generation
- training corpus
- natural language processing
- word pairs
- indian languages
- linguistic features
- probabilistic context free grammars
- word meaning
- search engine
- semantic information
- comparable corpora
- word order
- out of vocabulary
- keyword extraction
- multi document summarization
- translation model
- multiword