Optimizing Word Alignments with Better Subword Tokenization.
Anh Khoa Ngo HoFrançois YvonPublished in: MTSummit (1) (2021)
Keyphrases
- n gram
- character n grams
- spoken document retrieval
- co occurrence
- pairwise
- language model
- out of vocabulary
- named entities
- word segmentation
- variable length
- language independent
- text classification
- language modeling
- part of speech
- word level
- machine translation
- broadcast news
- test collection
- biomedical text
- probabilistic model