Lawformer: A Pre-trained Language Model for Chinese Legal Long Documents.
Chaojun XiaoXueyu HuZhiyuan LiuCunchao TuMaosong SunPublished in: CoRR (2021)
Keyphrases
- language model
- pre trained
- document retrieval
- ad hoc information retrieval
- query terms
- information retrieval
- document ranking
- vector space model
- language modeling
- query expansion
- n gram
- query specific
- test collection
- document length
- probabilistic model
- relevance model
- retrieval model
- word clouds
- relevant documents
- pseudo feedback
- speech recognition
- retrieved documents
- multiword
- training data
- retrieval effectiveness
- language modeling framework
- pseudo relevance feedback
- training examples
- document collections
- information retrieval systems
- smoothing methods
- retrieval systems
- word segmentation
- web documents
- inter document similarities
- tf idf
- document clustering
- term frequency
- text documents
- information extraction
- decision trees