Character, Word, or Both? Revisiting the Segmentation Granularity for Chinese Pre-trained Language Models.
Xinnian LiangZefan ZhouHui HuangShuangzhi WuTong XiaoMuyun YangZhoujun LiChao BianPublished in: CoRR (2023)
Keyphrases
- word segmentation
- language model
- n gram
- language modeling
- pre trained
- statistical language modeling
- word recognition
- out of vocabulary
- document retrieval
- language independent
- speech recognition
- retrieval model
- probabilistic model
- information retrieval
- handwriting recognition
- word level
- cross lingual
- translation model
- test collection
- query expansion
- data sets
- query terms
- training data
- control signals
- support vector
- face recognition
- image segmentation
- learning algorithm