PolyViT: Co-training Vision Transformers on Images, Videos and Audio.
Valerii LikhosherstovAnurag ArnabKrzysztof ChoromanskiMario LucicYi TayAdrian WellerMostafa DehghaniPublished in: CoRR (2021)
Keyphrases
- co training
- visual data
- image data
- input image
- three dimensional
- object recognition
- multi view
- semi supervised
- image processing
- email classification
- data sets
- visual information
- semi supervised learning
- small number
- image segmentation
- viewpoint
- reinforcement learning
- unlabeled data
- similarity measure
- single view
- decision trees
- computer vision