Filter dates
Overview
- language model
- neural network
- semantic segmentation
- reference resolution
- confidence scores
Publications
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation.
CoRR
Learning Correlation Structures for Vision Transformers.
CoRR
Zero-shot Referring Image Segmentation with Global-Local Context Features.
CoRR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR.
CoRR
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning.
CVPR
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR.
CVPR