EcomGPT-CT: Continual Pre-training of E-commerce Large Language Models with Semi-structured Data.
Shirong MaShen HuangShulin HuangXiaobin WangYangning LiHai-Tao ZhengPengjun XieFei HuangYong JiangPublished in: CoRR (2023)
Keyphrases
- language model
- semi structured data
- language modeling
- structured data
- semi structured
- web mining
- probabilistic model
- document retrieval
- language modelling
- n gram
- retrieval model
- xml documents
- xml data
- speech recognition
- test collection
- statistical language models
- information retrieval
- query expansion
- smoothing methods
- context sensitive
- web data
- data model
- language models for information retrieval
- relevance model
- training set
- query terms
- heterogeneous data
- path expressions
- data sets