Multi-Modal Hallucination Control by Visual Information Grounding.
Alessandro FaveroLuca ZancatoMatthew TragerSiddharth ChoudharyPramuditha PereraAlessandro AchilleAshwin SwaminathanStefano SoattoPublished in: CoRR (2024)
Keyphrases
- multi modal
- visual information
- audio visual
- visual features
- low level
- visual cues
- visual data
- multi modality
- textual information
- cross modal
- semantic information
- visual content
- image collections
- eye movements
- high dimensional
- visual information retrieval
- video search
- uni modal
- visual similarity
- semantic concepts
- domain knowledge
- object recognition
- image content
- higher level
- multiple modalities
- similarity measure
- knowledge base