ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations.
Shizhe DiaoJiaxin BaiYan SongTong ZhangYonggang WangPublished in: CoRR (2019)
Keyphrases
- n gram
- chinese text
- word segmentation
- language model
- language independent
- bag of words
- language modelling
- text classification
- variable length
- viterbi algorithm
- part of speech
- bit rate
- language modeling
- inside outside algorithm
- training set
- feature space
- character n grams
- topic models
- collaborative filtering
- data analysis
- artificial intelligence
- information retrieval