Cascaded cross-modal transformer for audio-textual classification.
Nicolae-Catalin RisteaAndrei AnghelRadu Tudor IonescuPublished in: Artif. Intell. Rev. (2024)
Keyphrases
- cross modal
- multi modal
- multimedia
- multimedia retrieval
- visual data
- feature selection
- image retrieval
- feature vectors
- visual recognition
- class labels
- text classification
- image classification
- multimedia information retrieval
- low level
- machine learning
- search engine
- visual similarity
- perceptual information
- supervised learning
- feature space
- keywords
- feature extraction
- metadata