Cem Mil Podcasts: A Spoken Portuguese Document Corpus for Multi-modal, Multi-lingual and Multi-dialect Information Access Research.
Ekaterina GarmashEdgar TanakaAnn CliftonJoana CorreiaSharmistha JatWinstead ZhuRosie JonesJussi KarlgrenPublished in: CLEF (2023)
Keyphrases
- multi modal
- multi lingual
- information access
- document corpus
- cross language
- information retrieval
- digital libraries
- search engine
- information retrieval systems
- user experience
- multiple information sources
- speech recognition
- image annotation
- cross lingual
- audio visual
- language independent
- supervised learning
- document clustering
- information seeking
- language model
- video search
- high dimensional
- feature selection