A Multi-Modal Multilingual Benchmark for Document Image Classification.
Yoshinari FujinumaSiddharth VariaNishant SankaranSrikar AppalarajuBonan MinYogarshi VyasPublished in: EMNLP (Findings) (2023)
Keyphrases
- multi modal
- image classification
- multi modality
- multilingual information retrieval
- bag of words
- cross modal
- audio visual
- image representation
- text documents
- visual features
- visual words
- document retrieval
- document collections
- information retrieval
- semantic concepts
- feature extraction
- video search
- digital libraries
- image annotation
- visual recognition
- keywords
- multi label
- high dimensional
- cross lingual
- text mining
- uni modal