A Fast Method to Filter Noisy Parallel Data WMT2023 Shared Task on Parallel Data Curation.
Nguyen-Hoang Minh-CongNguyen Van VinhNguyen Le-MinhPublished in: WMT (2023)
Keyphrases
- synthetic data
- noisy data
- data collection
- statistical methods
- high dimensional data
- data processing
- missing data
- end users
- raw data
- data points
- test data
- data analysis
- data structure
- high quality
- detection method
- data sets
- correlation analysis
- database
- input data
- image data
- prior knowledge
- data mining techniques
- probability distribution
- decision trees
- learning algorithm