PolyViT: Co-training Vision Transformers on Images, Videos and Audio.
Valerii LikhosherstovAnurag ArnabKrzysztof Marcin ChoromanskiMario LucicYi TayMostafa DehghaniPublished in: Trans. Mach. Learn. Res. (2023)
Keyphrases
- co training
- three dimensional
- image data
- input image
- object recognition
- visual data
- semi supervised learning
- multi view
- multimedia
- lighting conditions
- visual information
- data sets
- semi supervised
- supervised learning
- computer vision
- unlabeled data
- multiple views
- high dimensional
- image processing
- single view
- image segmentation
- machine learning