Publication: Temporally Multi-Modal Semantic Reasoning with Spatial Language Constraints for Video Question Answering.