Publication: Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning.