RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.
Xuying ZhangXiaoshuai SunYunpeng LuoJiayi JiYiyi ZhouYongjian WuFeiyue HuangRongrong JiPublished in: CVPR (2021)
Keyphrases
- visual words
- visual vocabulary
- visual phrases
- bag of words
- image classification
- visual categorization
- medical image retrieval
- bag of visual words
- co occurrence
- image representation
- visual content
- bag of features
- visual features
- scene classification
- image retrieval
- image features
- image descriptors
- spatial information
- semantic context
- spatial relationships
- object categories
- visual information
- feature vectors
- sift descriptors
- object retrieval
- gaussian mixture model
- keypoints
- n gram
- low level
- visual dictionary
- pairwise
- multiscale
- image content
- image matching
- text mining
- bag of words models