Beyond the English Web: Zero-Shot Cross-Lingual and Lightweight Monolingual Classification of Registers.
Liina RepoValtteri SkantsiSamuel RönnqvistSaara HellströmMiika OinonenAnna SalmelaDouglas BiberJesse EgbertSampo PyysaloVeronika LaippalaPublished in: CoRR (2021)
Keyphrases
- cross lingual
- lightweight
- text classification
- machine translation
- cross language
- cross lingual information retrieval
- parallel corpus
- european languages
- event extraction
- language modeling
- language independent
- indian languages
- language specific
- mono lingual
- machine learning
- word alignment
- wireless sensor networks
- word sense
- news articles
- web pages
- query translation
- cross language retrieval
- answer extraction
- document clustering
- natural language
- supervised learning
- parallel corpora
- translation model
- n gram
- language model
- co occurrence
- text mining
- document collections
- statistical machine translation
- cross language information retrieval