Two LRL & Distractor Corpora from Web Information Retrieval and a Small Case Study in Language Identification without Training Corpora.
Armin HoenenCemre KocMarc RahnPublished in: SLTU/CCURL@LREC (2020)
Keyphrases
- web information retrieval
- language identification
- training corpora
- training corpus
- web search
- text summarization
- training data
- parallel corpora
- natural language processing
- information retrieval
- text classification
- query refinement
- multimedia
- statistical machine translation
- named entity recognition
- machine translation
- em algorithm
- probabilistic model