Publication: Stack-VS: Stacked Visual-Semantic Attention for Image Caption Generation.