Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs.
Juyong SongSunghyun ChoiPublished in: BMVC (2021)
Keyphrases
- input image
- single image
- image data
- image features
- complex scenes
- image regions
- scene understanding
- image alignment
- scene images
- multiscale
- image representation
- image segmentation
- reference images
- image classification
- image content
- imaging process
- image retrieval
- decoding process
- scene matching
- spatial relations
- scene geometry
- outdoor scenes
- scene classification
- vanishing points
- ground plane
- multiple objects
- image collections
- high resolution
- moving objects
- video sequences
- multiple images
- lighting conditions
- d scene
- piecewise planar
- three dimensional