Multimodal Transformer with Multi-View Visual Representation for Image Captioning.
Jun YuJing LiZhou YuQingming HuangPublished in: CoRR (2019)
Keyphrases
- multi view
- visual representation
- single view
- input image
- multiple views
- multi views
- multi view images
- three dimensional
- single image
- camera calibration
- test images
- depth map
- multi view stereo
- semi supervised
- d objects
- reconstruction method
- multiple cameras
- image synthesis
- multi view reconstruction
- feature points
- expert systems
- bundle adjustment
- surface reconstruction
- user interface
- image matching
- high resolution
- range images
- image regions
- object recognition
- multi view face detection
- free viewpoint
- video sequences
- multi view learning
- multiple viewpoints
- scene reconstruction
- viewpoint
- d scene
- learning process
- motion estimation