3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding.
Muhammad Abdullah JamalOmid MohareriPublished in: CoRR (2023)
Keyphrases
- multi modal
- auto annotation
- semantic concepts
- image data
- image content
- video search
- image representation
- input image
- image features
- multiscale
- high resolution
- image retrieval
- fusing multiple
- image regions
- image collections
- image analysis
- segmentation method
- image segmentation
- multimedia
- uni modal
- cross modal
- visual concepts
- video sequences
- mean shift
- video data
- feature selection
- denoising