GeBioToolkit: Automatic Extraction of Gender-Balanced Multilingual Corpus of Wikipedia Biographies.
Marta R. Costa-jussàPau Li LinCristina España-BonetPublished in: CoRR (2019)
Keyphrases
- automatic extraction
- natural language text
- relation extraction
- world knowledge
- wikipedia articles
- entity extraction
- named entity disambiguation
- parallel corpus
- digital libraries
- language independent
- document corpus
- text corpus
- wordnet
- cross lingual
- individual differences
- semantic information
- html documents
- document collections
- semantic relations
- chinese english
- cross language
- machine translation
- gender differences
- comparable corpora
- topic tracking
- web data
- cross language information retrieval