OccGen: Selection of Real-world Multilingual Parallel Data Balanced in Gender within Occupations.
Marta R. Costa-jussàChristine BastaOriol DomingoAndré RubungoPublished in: NeurIPS (2022)
Keyphrases
- data sets
- synthetic data
- real world
- raw data
- training data
- complex data
- data distribution
- data analysis
- data quality
- original data
- image data
- search engine
- high quality
- end users
- data mining techniques
- data collection
- statistical methods
- log data
- sensor data
- high dimensional data
- data processing
- small number
- knowledge discovery
- data points
- probability distribution
- databases