Publication: From FiLM to Video: Multi-turn Question Answering with Multi-modal Context.