Efficient Object-Level Visual Context Modeling for Multimodal Machine Translation: Masking Irrelevant Objects Helps Grounding.
Dexin WangDeyi XiongPublished in: CoRR (2021)
Keyphrases
- object level
- machine translation
- low level
- cross modal
- high level
- pixel level
- natural language processing
- higher level
- object class
- information extraction
- multi modal
- natural language
- object detection
- visual features
- image classification
- document collections
- pairwise
- visual information
- visual data
- artificial intelligence