Publication: Efficient End-to-End Video Question Answering with Pyramidal Multimodal Transformer.