COIN - an Inexpensive and Strong Baseline for Predicting Out of Vocabulary Word Embeddings.
Andrew T. SchneiderLihong HeZhijia ChenArjun MukherjeeEduard C. DragutPublished in: COLING (2022)
Keyphrases
- out of vocabulary
- word segmentation
- n gram
- language model
- spoken document retrieval
- query words
- named entity recognition
- broadcast news
- cross language information retrieval
- parallel corpora
- hand crafted
- cross lingual
- named entities
- spoken term detection
- previously unseen
- query terms
- information extraction
- vector space
- term frequency
- word level
- language modeling
- text classification
- low dimensional
- language independent
- labor intensive
- sentence level
- retrieval model
- word pairs
- machine translation
- speech recognition
- natural language processing
- semi supervised