Publication: Transformer-Based Multi-modal Proposal and Re-Rank for Wikipedia Image-Caption Matching.