Multimodal Transformer With Multi-View Visual Representation for Image Captioning.
Jun YuJing LiZhou YuQingming HuangPublished in: IEEE Trans. Circuits Syst. Video Technol. (2020)
Keyphrases
- multi view
- visual representation
- single view
- multiple views
- input image
- multi view images
- d objects
- depth map
- multi views
- three dimensional
- single image
- semi supervised
- camera calibration
- bundle adjustment
- high resolution
- test images
- range images
- multi view learning
- feature points
- surface reconstruction
- multiple cameras
- view synthesis
- light field
- image matching
- free viewpoint
- geometric constraints
- camera parameters
- point cloud
- image regions
- video sequences
- similarity measure