UnIVAL: Unified Model for Image, Video, Audio and Language Tasks.
Mustafa ShukorCorentin DancetteAlexandre RaméMatthieu CordPublished in: Trans. Mach. Learn. Res. (2023)
Keyphrases
- unified model
- image data
- visual data
- video files
- input image
- multimedia
- single image
- image retrieval
- image content
- multiscale
- image analysis
- low level
- image features
- visual cues
- image representation
- digital video
- key frames
- static images
- feature points
- programming language
- video images
- image collections
- image regions
- pixel values
- video streams
- temporal continuity
- audio video
- multimedia information
- high resolution
- video content analysis
- video frames
- multimedia processing
- scene change detection
- image frames
- test images
- visual information
- segmentation method
- color images
- image segmentation