Publication: Listen and Look: Multi-Modal Aggregation and Co-Attention Network for Video-Audio Retrieval.