Show, Attend and Tell: Neural Image Caption Generation with Visual Attention.
Kelvin XuJimmy BaRyan KirosKyunghyun ChoAaron C. CourvilleRuslan SalakhutdinovRichard S. ZemelYoshua BengioPublished in: ICML (2015)
Keyphrases
- visual attention
- biological vision systems
- salient regions
- visual perception
- input image
- attention mechanism
- visual attention model
- saliency map
- image data
- image retrieval
- eye movements
- image content
- low level
- vision system
- focus of attention
- image segmentation
- natural scenes
- image classification
- eye tracking
- visual scene
- higher level
- image representation
- eye fixations
- visual search
- visual saliency detection
- image features
- object recognition
- image collections
- region of interest
- multiscale
- visual processing
- high resolution
- high level
- visual input
- visual motion
- real time
- machine learning
- stereoscopic images
- image regions
- scale invariant
- bio inspired
- low level features