​
Login / Signup
Yaya Shi
ORCID
Publication Activity (10 Years)
Years Active: 2018-2024
Publications (10 Years): 20
Top Topics
Uni Modal
Multiple Images
Text Retrieval
Language Model
Top Venues
CoRR
LREC/COLING
CVPR
ACM Trans. Multim. Comput. Commun. Appl.
</>
Publications
</>
Haowei Liu
,
Yaya Shi
,
Haiyang Xu
,
Chunfeng Yuan
,
Qinghao Ye
,
Chenliang Li
,
Ming Yan
,
Ji Zhang
,
Fei Huang
,
Bing Li
,
Weiming Hu
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training.
LREC/COLING
(2024)
Haowei Liu
,
Yaya Shi
,
Haiyang Xu
,
Chunfeng Yuan
,
Qinghao Ye
,
Chenliang Li
,
Ming Yan
,
Ji Zhang
,
Fei Huang
,
Bing Li
,
Weiming Hu
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training.
CoRR
(2024)
Haowei Liu
,
Yaya Shi
,
Haiyang Xu
,
Chunfeng Yuan
,
Qinghao Ye
,
Chenliang Li
,
Ming Yan
,
Ji Zhang
,
Fei Huang
,
Bing Li
,
Weiming Hu
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval.
LREC/COLING
(2024)
Haowei Liu
,
Yaya Shi
,
Haiyang Xu
,
Chunfeng Yuan
,
Qinghao Ye
,
Chenliang Li
,
Ming Yan
,
Ji Zhang
,
Fei Huang
,
Bing Li
,
Weiming Hu
Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval.
CoRR
(2024)
Haowei Liu
,
Xi Zhang
,
Haiyang Xu
,
Yaya Shi
,
Chaoya Jiang
,
Ming Yan
,
Ji Zhang
,
Fei Huang
,
Chunfeng Yuan
,
Bing Li
,
Weiming Hu
MIBench: Evaluating Multimodal Large Language Models over Multiple Images.
CoRR
(2024)
Haiyang Xu
,
Qinghao Ye
,
Ming Yan
,
Yaya Shi
,
Jiabo Ye
,
Yuanhong Xu
,
Chenliang Li
,
Bin Bi
,
Qi Qian
,
Wei Wang
,
Guohai Xu
,
Ji Zhang
,
Songfang Huang
,
Fei Huang
,
Jingren Zhou
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
ICML
(2023)
Yaya Shi
,
Haiyang Xu
,
Chunfeng Yuan
,
Bing Li
,
Weiming Hu
,
Zheng-Jun Zha
Learning Video-Text Aligned Representations for Video Captioning.
ACM Trans. Multim. Comput. Commun. Appl.
19 (2) (2023)
Anwen Hu
,
Yaya Shi
,
Haiyang Xu
,
Jiabo Ye
,
Qinghao Ye
,
Ming Yan
,
Chenliang Li
,
Qi Qian
,
Ji Zhang
,
Fei Huang
mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.
CoRR
(2023)
Yaya Shi
,
Haowei Liu
,
Haiyang Xu
,
Zongyang Ma
,
Qinghao Ye
,
Anwen Hu
,
Ming Yan
,
Ji Zhang
,
Fei Huang
,
Chunfeng Yuan
,
Bing Li
,
Weiming Hu
,
Zheng-Jun Zha
Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval.
ACM Multimedia
(2023)
Haiyang Xu
,
Qinghao Ye
,
Xuan Wu
,
Ming Yan
,
Yuan Miao
,
Jiabo Ye
,
Guohai Xu
,
Anwen Hu
,
Yaya Shi
,
Guangwei Xu
,
Chenliang Li
,
Qi Qian
,
Maofei Que
,
Ji Zhang
,
Xiao Zeng
,
Fei Huang
Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.
CoRR
(2023)
Qinghao Ye
,
Haiyang Xu
,
Guohai Xu
,
Jiabo Ye
,
Ming Yan
,
Yiyang Zhou
,
Junyang Wang
,
Anwen Hu
,
Pengcheng Shi
,
Yaya Shi
,
Chenliang Li
,
Yuanhong Xu
,
Hehong Chen
,
Junfeng Tian
,
Qian Qi
,
Ji Zhang
,
Fei Huang
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.
CoRR
(2023)
Haiyang Xu
,
Qinghao Ye
,
Ming Yan
,
Yaya Shi
,
Jiabo Ye
,
Yuanhong Xu
,
Chenliang Li
,
Bin Bi
,
Qi Qian
,
Wei Wang
,
Guohai Xu
,
Ji Zhang
,
Songfang Huang
,
Fei Huang
,
Jingren Zhou
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
CoRR
(2023)
Yaya Shi
,
Xu Yang
,
Haiyang Xu
,
Chunfeng Yuan
,
Bing Li
,
Weiming Hu
,
Zheng-Jun Zha
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching.
CVPR
(2022)
Zhenbang Li
,
Yaya Shi
,
Jin Gao
,
Shaoru Wang
,
Bing Li
,
Pengpeng Liang
,
Weiming Hu
A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking.
IEEE Trans. Circuits Syst. Video Technol.
32 (6) (2022)
Yaya Shi
,
Xu Yang
,
Haiyang Xu
,
Chunfeng Yuan
,
Bing Li
,
Weiming Hu
,
Zheng-Jun Zha
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching.
CoRR
(2021)
Zhenbang Li
,
Yaya Shi
,
Jin Gao
,
Shaoru Wang
,
Bing Li
,
Pengpeng Liang
,
Weiming Hu
A Simple and Strong Baseline for Universal Targeted Attacks on Siamese Visual Tracking.
CoRR
(2021)
Ziqi Zhang
,
Yaya Shi
,
Chunfeng Yuan
,
Bing Li
,
Peijin Wang
,
Weiming Hu
,
Zheng-Jun Zha
Object Relational Graph With Teacher-Recommended Learning for Video Captioning.
CVPR
(2020)
Ziqi Zhang
,
Yaya Shi
,
Chunfeng Yuan
,
Bing Li
,
Peijin Wang
,
Weiming Hu
,
Zhengjun Zha
Object Relational Graph with Teacher-Recommended Learning for Video Captioning.
CoRR
(2020)
Ziqi Zhang
,
Yaya Shi
,
Jiutong Wei
,
Chunfeng Yuan
,
Bing Li
,
Weiming Hu
VATEX Captioning Challenge 2019: Multi-modal Information Fusion and Multi-stage Training Strategy for Video Captioning.
CoRR
(2019)
Yaya Shi
,
Fujun Niu
,
Chengsong Yang
,
Tao Che
,
Zhanju Lin
,
Jing Luo
Permafrost Presence/Absence Mapping of the Qinghai-Tibet Plateau Based on Multi-Source Remote Sensing Data.
Remote. Sens.
10 (2) (2018)