ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision.
Wonjae KimBokyung SonIldoo KimPublished in: ICML (2021)
Keyphrases
- image processing
- programming language
- computer vision
- natural language
- active learning
- vision system
- language learning
- fuzzy logic
- power system
- neural network
- real time
- artificial intelligence
- data sets
- specification language
- language processing
- region of interest
- input image
- image features
- information extraction
- database systems
- information retrieval