MultiLegalPile: A 689GB Multilingual Legal Corpus.
Joel NiklausVeton MatoshiMatthias StürmerIlias ChalkidisDaniel E. HoPublished in: CoRR (2023)
Keyphrases
- parallel corpus
- manually annotated
- case law
- high speed
- legal knowledge
- legal information retrieval
- digital libraries
- cross lingual
- cross language information retrieval
- open domain
- training corpus
- wide coverage
- legal texts
- statistical machine translation
- comparable corpora
- chinese english
- legal documents
- legal reasoning
- language independent
- lexical knowledge
- test set
- machine learning
- supervised machine learning
- linguistic features
- word pairs
- legal information
- sentence level
- artificial intelligence and law
- information retrieval