A large English-Thai parallel corpus from the web and machine-generated text.
Lalita LowphansirikulCharin PolpanumasAttapol T. RutherfordSarana NutanongPublished in: Lang. Resour. Evaluation (2022)
Keyphrases
- parallel corpus
- machine translation system
- cross lingual
- web documents
- machine translation
- query translation
- cross language information retrieval
- language independent
- word alignment
- latent semantic analysis
- source language
- text information
- statistical machine translation
- sentence pairs
- web images
- target language
- web pages
- word segmentation
- text retrieval
- semantic space
- parallel corpora
- text documents
- cross language
- text mining
- keywords
- information retrieval
- semantic information
- digital libraries
- bilingual dictionaries