Publication: Multimodal attention-based transformer for video captioning.