Publication: Image-Text Alignment using Adaptive Cross-attention with Transformer Encoder for Scene Graphs.