3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding.
Muhammad Abdullah JamalOmid MohareriPublished in: WACV (2024)
Keyphrases
- multi modal
- auto annotation
- input image
- image features
- multiscale
- automatic image annotation
- image segmentation
- multiple modalities
- semantic concepts
- image analysis
- multi modality
- image content
- video search
- high resolution
- low level
- image data
- uni modal
- video sequences
- audio visual
- visual recognition
- similarity measure
- multimedia
- image collections
- segmentation method
- image representation
- image classification
- image retrieval
- high dimensional