Sign in

Otter: A Multi-Modal Model with In-Context Instruction Tuning.

Bo LiYuanhan ZhangLiangyu ChenJinghao WangJingkang YangZiwei Liu
Published in: CoRR (2023)
Keyphrases
  • multi modal
  • high level
  • multimedia
  • audio visual
  • metadata
  • similarity measure
  • video sequences
  • low level
  • semantic concepts