Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages.
Georgie BotevArya D. McCarthyWinston WuDavid YarowskyPublished in: COLING (2022)
Keyphrases
- out of vocabulary
- language specific
- cross lingual
- parallel corpora
- word segmentation
- n gram
- language independent
- language model
- cross language information retrieval
- named entity recognition
- spoken document retrieval
- word forms
- broadcast news
- query translation
- machine translation
- named entities
- text summarization
- language modeling
- bilingual dictionaries
- word level
- machine translation system
- statistical machine translation
- hand crafted
- text classification
- natural language
- natural language processing
- word pairs
- query terms
- bag of words
- parallel corpus
- co occurrence
- information extraction
- word recognition
- labor intensive
- language processing
- machine learning
- retrieval model