Lawformer: A pre-trained language model for Chinese legal long documents.
Chaojun XiaoXueyu HuZhiyuan LiuCunchao TuMaosong SunPublished in: AI Open (2021)
Keyphrases
- language model
- pre trained
- document retrieval
- ad hoc information retrieval
- information retrieval
- query terms
- document ranking
- vector space model
- language modeling
- n gram
- retrieval model
- query expansion
- probabilistic model
- relevance model
- document collections
- query specific
- speech recognition
- document length
- relevant documents
- pseudo feedback
- word clouds
- training data
- test collection
- retrieved documents
- word segmentation
- multiword
- training examples
- information retrieval systems
- smoothing methods
- retrieval effectiveness
- language modeling framework
- pseudo relevance feedback
- retrieval systems
- document clustering
- inter document similarities
- term frequency
- text documents
- statistical machine translation
- translation model