Creating a ground truth multilingual dataset of news and talk show transcriptions through crowdsourcing.
Rachele SprugnoliGiovanni MorettiLuisa BentivogliDiego GiulianiPublished in: Lang. Resour. Evaluation (2017)
Keyphrases
- ground truth
- manually labeled
- broadcast news
- ground truth data
- language resources
- gold standard
- quantitative evaluation
- news articles
- mechanical turk
- benchmark datasets
- digital libraries
- high quality
- social media
- language independent
- web news
- test images
- cross lingual
- feature set
- synthetic datasets
- user generated content
- segmented images
- website
- metadata
- database