UnihanLM: Coarse-to-Fine Chinese-Japanese Language Model Pretraining with the Unihan Database.
Canwen XuTao GeChenliang LiFuru WeiPublished in: AACL/IJCNLP (2020)
Keyphrases
- language model
- coarse to fine
- language modeling
- multiscale
- n gram
- multiresolution
- hierarchical segmentation
- speech recognition
- information retrieval
- document retrieval
- probabilistic model
- image registration
- object detection
- ad hoc information retrieval
- query expansion
- retrieval model
- statistical language models
- language modelling
- mixture model
- context sensitive
- language model for information retrieval
- smoothing methods
- cross language retrieval
- word segmentation
- relevance model
- dynamic programming
- test collection
- training data
- translation model
- query specific
- query terms
- text mining
- supervised learning
- learning algorithm