Extracting Parallel Paragraphs and Sentences from English-Persian Translated Documents.
Mohammad Sadegh RasooliOmid KashefiBehrouz Minaei-BidgoliPublished in: AIRS (2011)
Keyphrases
- source language
- target language
- linguistic features
- machine translation
- machine translation system
- linguistic analysis
- cross lingual
- natural language
- parallel corpora
- sentence level
- parallel corpus
- statistical machine translation
- arabic language
- document summarization
- cross language
- text classification
- multi document summarization
- person names
- retrieved documents
- query translation
- multiword
- cross language information retrieval
- training corpus
- extractive summarization
- document retrieval
- document collections
- link grammar
- document set
- news stories
- sentiment analysis
- word alignment
- information retrieval systems
- relevant documents
- information retrieval
- text corpus
- information extraction
- highly ambiguous
- text documents
- language modeling
- word sense disambiguation
- sentence extraction
- sentence pairs
- document level
- arabic documents
- news articles
- stop words
- test collection
- ad hoc retrieval
- related words