X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks.
Zhaowei CaiGukyeong KwonAvinash RavichandranErhan BasZhuowen TuRahul BhotikaStefano SoattoPublished in: CoRR (2022)
Keyphrases
- real time
- vision system
- visually guided
- management system
- description languages
- layered architecture
- information systems
- programming language
- computer vision
- software architecture
- natural language
- language processing
- pairwise
- natural language processing
- language learning
- network architecture
- agent communication
- neural network