​
Login / Signup
Difei Gao
ORCID
Publication Activity (10 Years)
Years Active: 2015-2024
Publications (10 Years): 45
Top Topics
Diffusion Models
Natural Language
Question Answering
Online Video
Top Venues
CoRR
ICCV
CVPR
ECCV (35)
</>
Publications
</>
Juan Hu
,
Xin Liao
,
Difei Gao
,
Satoshi Tsutsui
,
Qian Wang
,
Zheng Qin
,
Mike Zheng Shou
Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces.
CoRR
(2024)
Kevin Qinghong Lin
,
Linjie Li
,
Difei Gao
,
Qinchen Wu
,
Mingyi Yan
,
Zhengyuan Yang
,
Lijuan Wang
,
Mike Zheng Shou
VideoGUI: A Benchmark for GUI Automation from Instructional Videos.
CoRR
(2024)
Qinchen Wu
,
Difei Gao
,
Kevin Qinghong Lin
,
Zhuoyu Wu
,
Xiangwu Guo
,
Peiran Li
,
Weichen Zhang
,
Hengxu Wang
,
Mike Zheng Shou
GUI Action Narrator: Where and When Did That Action Take Place?
CoRR
(2024)
Henry Hengyuan Zhao
,
Pan Zhou
,
Difei Gao
,
Mike Zheng Shou
LOVA3: Learning to Visual Question Answering, Asking and Assessment.
CoRR
(2024)
Kevin Qinghong Lin
,
Pengchuan Zhang
,
Difei Gao
,
Xide Xia
,
Joya Chen
,
Ziteng Gao
,
Jinheng Xie
,
Xuhong Xiao
,
Mike Zheng Shou
Learning Video Context as Interleaved Multimodal Sequences.
CoRR
(2024)
Ziyi Bai
,
Ruiping Wang
,
Difei Gao
,
Xilin Chen
Event Graph Guided Compositional Spatial-Temporal Reasoning for Video Question Answering.
IEEE Trans. Image Process.
33 (2024)
Joya Chen
,
Zhaoyang Lv
,
Shiwei Wu
,
Kevin Qinghong Lin
,
Chenan Song
,
Difei Gao
,
Jia-Wei Liu
,
Ziteng Gao
,
Dongxing Mao
,
Mike Zheng Shou
VideoLLM-online: Online Video Large Language Model for Streaming Video.
CoRR
(2024)
Kevin Qinghong Lin
,
Pengchuan Zhang
,
Joya Chen
,
Shraman Pramanick
,
Difei Gao
,
Alex Jinpeng Wang
,
Rui Yan
,
Mike Zheng Shou
UniVTG: Towards Unified Video-Language Temporal Grounding.
CoRR
(2023)
Difei Gao
,
Ruiping Wang
,
Shiguang Shan
,
Xilin Chen
CRIC: A VQA Dataset for Compositional Reasoning on Vision and Commonsense.
IEEE Trans. Pattern Anal. Mach. Intell.
45 (5) (2023)
Muhammet Ilaslan
,
Chenan Song
,
Joya Chen
,
Difei Gao
,
Weixian Lei
,
Qianli Xu
,
Joo Lim
,
Mike Zheng Shou
GazeVQA: A Video Question Answering Dataset for Multiview Eye-Gaze Task-Oriented Collaborations.
EMNLP
(2023)
Difei Gao
,
Lei Ji
,
Luowei Zhou
,
Kevin Qinghong Lin
,
Joya Chen
,
Zihan Fan
,
Mike Zheng Shou
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.
CoRR
(2023)
Difei Gao
,
Lei Ji
,
Zechen Bai
,
Mingyu Ouyang
,
Peiran Li
,
Dongxing Mao
,
Qinchen Wu
,
Weichen Zhang
,
Peiyi Wang
,
Xiangwu Guo
,
Hengxu Wang
,
Luowei Zhou
,
Mike Zheng Shou
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation.
CoRR
(2023)
Joya Chen
,
Difei Gao
,
Kevin Qinghong Lin
,
Mike Zheng Shou
Affordance Grounding from Demonstration Video to Target Image.
CoRR
(2023)
Weixian Lei
,
Yixiao Ge
,
Kun Yi
,
Jianfeng Zhang
,
Difei Gao
,
Dylan Sun
,
Yuying Ge
,
Ying Shan
,
Mike Zheng Shou
ViT-Lens-2: Gateway to Omni-modal Intelligence.
CoRR
(2023)
Zhijian Hou
,
Wanjun Zhong
,
Lei Ji
,
Difei Gao
,
Kun Yan
,
Wing Kwong Chan
,
Chong-Wah Ngo
,
Mike Zheng Shou
,
Nan Duan
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding.
ACL (1)
(2023)
Zhijian Hou
,
Lei Ji
,
Difei Gao
,
Wanjun Zhong
,
Kun Yan
,
Chao Li
,
Wing-Kwong Chan
,
Chong-Wah Ngo
,
Nan Duan
,
Mike Zheng Shou
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023.
CoRR
(2023)
Juan Hu
,
Xin Liao
,
Difei Gao
,
Satoshi Tsutsui
,
Qian Wang
,
Zheng Qin
,
Mike Zheng Shou
Mover: Mask and Recovery based Facial Part Consistency Aware Method for Deepfake Video Detection.
CoRR
(2023)
Jay Zhangjie Wu
,
Xiuyu Li
,
Difei Gao
,
Zhen Dong
,
Jinbin Bai
,
Aishani Singh
,
Xiaoyu Xiang
,
Youzeng Li
,
Zuwei Huang
,
Yuanxi Sun
,
Rui He
,
Feng Hu
,
Junhua Hu
,
Hai Huang
,
Hanyu Zhu
,
Xu Cheng
,
Jie Tang
,
Mike Zheng Shou
,
Kurt Keutzer
,
Forrest N. Iandola
CVPR 2023 Text Guided Video Editing Competition.
CoRR
(2023)
Parantak Singh
,
You Li
,
Ankur Sikarwar
,
Weixian Lei
,
Difei Gao
,
Morgan B. Talbot
,
Ying Sun
,
Mike Zheng Shou
,
Gabriel Kreiman
,
Mengmi Zhang
Learning to Learn: How to Continuously Teach Humans and Machines.
ICCV
(2023)
Kevin Qinghong Lin
,
Pengchuan Zhang
,
Joya Chen
,
Shraman Pramanick
,
Difei Gao
,
Alex Jinpeng Wang
,
Rui Yan
,
Mike Zheng Shou
UniVTG: Towards Unified Video-Language Temporal Grounding.
ICCV
(2023)
Juan Hu
,
Xin Liao
,
Difei Gao
,
Satoshi Tsutsui
,
Qian Wang
,
Zheng Qin
,
Mike Zheng Shou
Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces.
CoRR
(2023)
David Junhao Zhang
,
Jay Zhangjie Wu
,
Jia-Wei Liu
,
Rui Zhao
,
Lingmin Ran
,
Yuchao Gu
,
Difei Gao
,
Mike Zheng Shou
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation.
CoRR
(2023)
Joya Chen
,
Difei Gao
,
Kevin Qinghong Lin
,
Mike Zheng Shou
Affordance Grounding from Demonstration Video to Target Image.
CVPR
(2023)
Juan Hu
,
Xin Liao
,
Difei Gao
,
Satoshi Tsutsui
,
Zheng Qin
,
Mike Zheng Shou
DeepfakeMAE: Facial Part Consistency Aware Masked Autoencoder for Deepfake Video Detection.
CoRR
(2023)
Difei Gao
,
Luowei Zhou
,
Lei Ji
,
Linchao Zhu
,
Yi Yang
,
Mike Zheng Shou
MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering.
CVPR
(2023)
Stan Weixian Lei
,
Difei Gao
,
Jay Zhangjie Wu
,
Yuxuan Wang
,
Wei Liu
,
Mengmi Zhang
,
Mike Zheng Shou
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task.
AAAI
(2023)
Kevin Qinghong Lin
,
Alex Jinpeng Wang
,
Mattia Soldan
,
Michael Wray
,
Rui Yan
,
Eric Zhongcong Xu
,
Difei Gao
,
Rong-Cheng Tu
,
Wenzhe Zhao
,
Weijie Kong
,
Chengfei Cai
,
Hongfa Wang
,
Dima Damen
,
Bernard Ghanem
,
Wei Liu
,
Mike Zheng Shou
Egocentric Video-Language Pretraining @ Ego4D Challenge 2022.
CoRR
(2022)
Kevin Qinghong Lin
,
Jinpeng Wang
,
Mattia Soldan
,
Michael Wray
,
Rui Yan
,
Eric Zhongcong Xu
,
Difei Gao
,
Rong-Cheng Tu
,
Wenzhe Zhao
,
Weijie Kong
,
Chengfei Cai
,
Hongfa Wang
,
Dima Damen
,
Bernard Ghanem
,
Wei Liu
,
Mike Zheng Shou
Egocentric Video-Language Pretraining.
NeurIPS
(2022)
Zhijian Hou
,
Wanjun Zhong
,
Lei Ji
,
Difei Gao
,
Kun Yan
,
Wing Kwong Chan
,
Chong-Wah Ngo
,
Zheng Shou
,
Nan Duan
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding.
CoRR
(2022)
Kevin Qinghong Lin
,
Alex Jinpeng Wang
,
Mattia Soldan
,
Michael Wray
,
Rui Yan
,
Eric Zhongcong Xu
,
Difei Gao
,
Rong-Cheng Tu
,
Wenzhe Zhao
,
Weijie Kong
,
Chengfei Cai
,
Hongfa Wang
,
Dima Damen
,
Bernard Ghanem
,
Wei Liu
,
Mike Zheng Shou
Egocentric Video-Language Pretraining.
CoRR
(2022)
Zhijian Hou
,
Wanjun Zhong
,
Lei Ji
,
Difei Gao
,
Kun Yan
,
Wing Kwong Chan
,
Chong-Wah Ngo
,
Zheng Shou
,
Nan Duan
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022.
CoRR
(2022)
Weixian Lei
,
Difei Gao
,
Yuxuan Wang
,
Dongxing Mao
,
Zihan Liang
,
Lingmin Ran
,
Mike Zheng Shou
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant.
EMNLP (Findings)
(2022)
Yuxuan Wang
,
Difei Gao
,
Licheng Yu
,
Weixian Lei
,
Matt Feiszli
,
Mike Zheng Shou
GEB+: A Benchmark for Generic Event Boundary Captioning, Grounding and Retrieval.
ECCV (35)
(2022)
Stan Weixian Lei
,
Difei Gao
,
Jay Zhangjie Wu
,
Yuxuan Wang
,
Wei Liu
,
Mengmi Zhang
,
Mike Zheng Shou
Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task.
CoRR
(2022)
Yuxuan Wang
,
Difei Gao
,
Licheng Yu
,
Stan Weixian Lei
,
Matt Feiszli
,
Mike Zheng Shou
GEB+: A benchmark for generic event boundary captioning, grounding and text-based retrieval.
CoRR
(2022)
Benita Wong
,
Joya Chen
,
You Wu
,
Stan Weixian Lei
,
Dongxing Mao
,
Difei Gao
,
Mike Zheng Shou
AssistQ: Affordance-Centric Question-Driven Task Completion for Egocentric Assistant.
ECCV (36)
(2022)
Benita Wong
,
Joya Chen
,
You Wu
,
Stan Weixian Lei
,
Dongxing Mao
,
Difei Gao
,
Mike Zheng Shou
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant.
CoRR
(2022)
Difei Gao
,
Luowei Zhou
,
Lei Ji
,
Linchao Zhu
,
Yi Yang
,
Mike Zheng Shou
MIST: Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering.
CoRR
(2022)
Stan Weixian Lei
,
Yuxuan Wang
,
Dongxing Mao
,
Difei Gao
,
Mike Zheng Shou
AssistSR: Affordance-centric Question-driven Video Segment Retrieval.
CoRR
(2021)
Difei Gao
,
Ruiping Wang
,
Ziyi Bai
,
Xilin Chen
Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments.
ICCV
(2021)
Difei Gao
,
Ke Li
,
Ruiping Wang
,
Shiguang Shan
,
Xilin Chen
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text.
CoRR
(2020)
Difei Gao
,
Ke Li
,
Ruiping Wang
,
Shiguang Shan
,
Xilin Chen
Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text.
CVPR
(2020)
Difei Gao
,
Ruiping Wang
,
Shiguang Shan
,
Xilin Chen
Learning to Recognize Visual Concepts for Visual Question Answering With Structural Label Space.
IEEE J. Sel. Top. Signal Process.
14 (3) (2020)
Difei Gao
,
Ruiping Wang
,
Shiguang Shan
,
Xilin Chen
From Two Graphs to N Questions: A VQA Dataset for Compositional Reasoning on Vision and Commonsense.
CoRR
(2019)
Difei Gao
,
Ruiping Wang
,
Shiguang Shan
,
Xilin Chen
Visual Textbook Network: Watch Carefully before Answering Visual Questions.
BMVC
(2017)
Difei Gao
,
Lili Pan
,
Risheng Liu
,
Rui Chen
,
Mei Xie
Correlated warped Gaussian processes for gender-specific age estimation.
ICIP
(2015)