Sign in

DialCLIP: Empowering CLIP as Multi-Modal Dialog Retriever.

Zhichao YinBinyuan HuiMin YangFei HuangYongbin Li
Published in: CoRR (2024)
Keyphrases
  • multi modal
  • multi modality
  • video clips
  • image annotation
  • video search
  • audio visual
  • semantic concepts
  • high dimensional
  • cross modal
  • fusing multiple
  • image analysis
  • higher level
  • low level features
  • uni modal