OmniTokenizer: A Joint Image-Video Tokenizer for Visual Generation.
Junke WangYi JiangZehuan YuanBingyue PengZuxuan WuYu-Gang JiangPublished in: CoRR (2024)
Keyphrases
- visual data
- visual cues
- low level
- single image
- image features
- image analysis
- visual appearance
- image classification
- input image
- image content
- image retrieval
- image data
- multiscale
- visual perception
- segmentation method
- image segmentation
- image collections
- mid level
- visually similar
- image representation
- spatial relations
- image frames
- static images
- spatial information
- test images
- key frames
- visual concepts
- visual vocabulary
- computer vision
- video files
- visual input
- multimedia
- video images
- semantic labels
- web images
- low level features
- image regions
- visual features
- feature points
- edge detection
- high resolution
- video sequences
- video retrieval
- video content
- video data
- visual analysis
- image quality
- visual effects
- image sequences
- global image statistics