Publication: Dense Image Representation with Spatial Pyramid VLAD Coding of CNN for Locally Robust Captioning.