ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations.

Shizhe Diao Jiaxin Bai Yan Song Tong Zhang Yonggang Wang

Published in: CoRR (2019)

Keyphrases

n gram
chinese text
word segmentation
language model
language independent
bag of words
language modelling
text classification
variable length
viterbi algorithm
part of speech
bit rate
language modeling
inside outside algorithm
training set
feature space
character n grams
topic models
collaborative filtering
data analysis
artificial intelligence
information retrieval