SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering.
Vipul GuptaZhuowan LiAdam KortylewskiChenyu ZhangYingwei LiAlan L. YuillePublished in: CoRR (2022)
Keyphrases
- question answering
- visual context
- visual scene
- semantic context
- temporal context
- information retrieval
- object detection
- scene interpretation
- information extraction
- natural language
- natural language processing
- visual information
- question answering systems
- visual features
- audio visual
- video annotation
- high level
- temporal information
- low level
- video sequences