BBT-Fin: Comprehensive Construction of Chinese Financial Domain Pre-trained Language Model, Corpus and Benchmark.
Dakuan LuJiaqing LiangYipei XuQianyu HeYipeng GengMengkun HanYingsi XinHengkui WuYanghua XiaoPublished in: CoRR (2023)
Keyphrases
- language model
- pre trained
- language modeling
- n gram
- statistical machine translation
- document level
- probabilistic model
- retrieval model
- query expansion
- speech recognition
- document retrieval
- multiword
- ad hoc information retrieval
- information retrieval
- translation model
- test collection
- mixture model
- chinese english
- training data
- training examples
- word segmentation
- context sensitive
- relevance model
- smoothing methods
- data sets
- text classification
- co occurrence
- small number
- query terms
- labeled data