WIKIR: A Python Toolkit for Building a Large-scale Wikipedia-based English Information Retrieval Dataset.
Jibril FrejDidier SchwabJean-Pierre ChevalletPublished in: LREC (2020)
Keyphrases
- information retrieval
- document collections
- computing semantic relatedness
- chinese web
- open source
- scripting language
- english language
- cross language ir
- cross language
- benchmark datasets
- search engine
- programming language
- information retrieval systems
- query expansion
- test collection
- retrieval systems
- knowledge base
- natural language
- answer questions
- real world
- english text
- real life
- synthetic datasets
- small scale
- text mining
- ad hoc retrieval
- statistical machine translation
- development tools
- object oriented
- world knowledge
- million images
- explicit semantic analysis
- machine translation
- question answering