Login / Signup

Rethinking 3D Dense Caption and Visual Grounding in A Unified Framework through Prompt-based Localization.

Yongdong LuoHaojia LinXiawu ZhengYigeng JiangFei ChaoJie HuGuannan JiangSongan ZhangRongrong Ji
Published in: CoRR (2024)
Keyphrases
  • visual features
  • visual information
  • news video
  • low level
  • visual data
  • visual perception
  • feature extraction
  • video content
  • video retrieval
  • visual cues
  • localization method