WIKIR: A Python toolkit for building a large-scale Wikipedia-based English Information Retrieval Dataset.
Jibril FrejDidier SchwabJean-Pierre ChevalletPublished in: CoRR (2019)
Keyphrases
- information retrieval
- document collections
- chinese web
- computing semantic relatedness
- test collection
- cross language ir
- scripting language
- small scale
- open source
- knowledge base
- english language
- cross language
- search engine
- information extraction
- million images
- information retrieval systems
- cross lingual
- machine translation
- information access
- programming language
- real life
- language modeling
- natural language
- linguistic analysis
- retrieval model
- language model
- benchmark datasets
- real world
- query expansion
- semantic information
- text retrieval
- wikipedia articles
- document corpus
- wordnet
- probabilistic model
- statistical machine translation
- target language
- link structure
- synthetic datasets
- named entities
- vector space model
- relevant documents
- language learning