Login / Signup
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation.
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Hongsheng Li
Published in:
CoRR (2023)
Keyphrases
</>
multi modal
high dimensional
multi modality
audio visual
cross modal
multimedia
computer vision
image processing
feature selection
semantic concepts