caWaC - A web corpus of Catalan and its application to language modeling and machine translation.
Nikola LjubesicAntonio ToralPublished in: LREC (2014)
Keyphrases
- language modeling
- machine translation
- cross lingual
- language model
- statistical machine translation
- comparable corpora
- parallel corpus
- finite state transducers
- parallel corpora
- retrieval model
- machine translation system
- language independent
- translation model
- information retrieval
- n gram
- query expansion
- sentence retrieval
- probabilistic model
- word alignment
- information extraction
- web pages
- web documents
- cross language information retrieval
- query translation
- natural language processing
- target language
- document retrieval
- cross language
- natural language
- text classification
- word level
- artificial intelligence
- relevance model
- linguistic resources
- machine learning
- bilingual dictionaries
- information access
- retrieval effectiveness
- test collection