ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations.

Shizhe Diao Jiaxin Bai Yan Song Tong Zhang Yonggang Wang

Published in: EMNLP (Findings) (2020)

Keyphrases

n gram
chinese text
word segmentation
language model
language modelling
bag of words
language independent
text classification
language modeling
variable length
part of speech
training set
viterbi algorithm
web documents
language specific
inside outside algorithm
bit rate
logic programs
character n grams
machine learning