GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies.
Marta R. Costa-jussàPau Li LinCristina España-BonetPublished in: LREC (2020)
Keyphrases
- automatic extraction
- natural language text
- relation extraction
- world knowledge
- wikipedia articles
- named entity disambiguation
- entity extraction
- parallel corpus
- digital libraries
- document corpus
- knowledge base
- biomedical literature
- wrapper generation
- topic tracking
- semantic information
- individual differences
- text corpus
- entity ranking
- html documents
- information retrieval
- term extraction
- comparable corpora
- gender differences
- machine translation
- semi automatically
- information extraction
- wordnet
- link structure
- language independent
- cross language information retrieval
- cross language
- cross lingual