mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus.
Matthieu FuteralArmel ZebazePedro Ortiz SuarezJulien AbadjiRémi LacroixCordelia SchmidRachel BawdenBenoît SagotPublished in: CoRR (2024)
Keyphrases
- document level
- sentence level
- sentiment classification
- language model
- query expansion
- cross lingual
- coreference resolution
- sentiment analysis
- multi modal
- pseudo relevance feedback
- digital libraries
- document retrieval
- cross language
- parallel corpora
- retrieval strategies
- language independent
- cross language information retrieval
- data mining
- machine learning