Grounding Everything: Emerging Localization Properties in Vision-Language Transformers.
Walid BousselhamFelix PetersenVittorio FerrariHilde KuehnePublished in: CoRR (2023)
Keyphrases
- programming language
- computer vision
- desirable properties
- real time
- formal language
- english language
- machine learning
- modeling language
- context dependent
- structural properties
- natural language
- language learning
- object recognition
- vision system
- rough sets
- multiscale
- website
- information systems
- information retrieval
- data mining
- data sets