Login / Signup
Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization.
Yongdong Luo
Haojia Lin
Xiawu Zheng
Yigeng Jiang
Fei Chao
Jie Hu
Guannan Jiang
Songan Zhang
Rongrong Ji
Published in:
CoRR (2024)
Keyphrases
</>
visual features
visual information
news video
low level
visual data
visual perception
feature extraction
video content
video retrieval
visual cues
localization method