Joint Speech Transcription and Translation: Pseudo-Labeling with Out-of-Distribution Data.
Mozhdeh GheiniTatiana LikhomanenkoMatthias SperberHendra SetiawanPublished in: ACL (Findings) (2023)
Keyphrases
- data sets
- raw data
- training data
- database
- data processing
- synthetic data
- data distribution
- data points
- complex data
- statistical analysis
- image data
- original data
- data analysis
- data structure
- data quality
- data acquisition
- machine translation system
- experimental data
- missing data
- historical data
- computer systems
- data collection
- small number
- knowledge discovery
- probability distribution
- data sources
- pairwise
- high quality
- clustering algorithm
- website
- data mining