ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations.
Shizhe DiaoJiaxin BaiYan SongTong ZhangYonggang WangPublished in: EMNLP (Findings) (2020)
Keyphrases
- n gram
- chinese text
- word segmentation
- language model
- language modelling
- bag of words
- language independent
- text classification
- language modeling
- variable length
- part of speech
- training set
- viterbi algorithm
- web documents
- language specific
- inside outside algorithm
- bit rate
- logic programs
- character n grams
- machine learning