Login / Signup
Muhammad Maaz
Publication Activity (10 Years)
Years Active: 2022-2024
Publications (10 Years): 14
Top Topics
Detecting Objects
Normalized Correlation
Cluttered Scenes
Language Model
Top Venues
CoRR
CVPR
ECCV Workshops (7)
ACL (1)
</>
Publications
</>
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Abdelrahman M. Shaker
,
Salman H. Khan
,
Hisham Cholakkal
,
Rao Muhammad Anwer
,
Tim Baldwin
,
Michael Felsberg
,
Fahad Shahbaz Khan
PALO: A Polyglot Large Multimodal Model for 5B People.
CoRR
(2024)
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman Khan
,
Fahad Shahbaz Khan
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding.
CoRR
(2024)
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman Khan
,
Fahad Khan
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models.
ACL (1)
(2024)
Abdelrahman Shaker
,
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman H. Khan
,
Ming-Hsuan Yang
,
Fahad Shahbaz Khan
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications.
CoRR
(2023)
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman H. Khan
,
Fahad Shahbaz Khan
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models.
CoRR
(2023)
Muhammad Uzair Khattak
,
Hanoona Abdul Rasheed
,
Muhammad Maaz
,
Salman H. Khan
,
Fahad Shahbaz Khan
MaPLe: Multi-modal Prompt Learning.
CVPR
(2023)
Shehan Munasinghe
,
Rusiru Thushara
,
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman Khan
,
Mubarak Shah
,
Fahad Khan
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models.
CoRR
(2023)
Hanoona Abdul Rasheed
,
Muhammad Maaz
,
Sahal Shaji Mullappilly
,
Abdelrahman Shaker
,
Salman H. Khan
,
Hisham Cholakkal
,
Rao Muhammad Anwer
,
Erix Xing
,
Ming-Hsuan Yang
,
Fahad Shahbaz Khan
GLaMM: Pixel Grounding Large Multimodal Model.
CoRR
(2023)
Hanoona Abdul Rasheed
,
Muhammad Uzair Khattak
,
Muhammad Maaz
,
Salman H. Khan
,
Fahad Shahbaz Khan
Fine-tuned CLIP Models are Efficient Video Learners.
CVPR
(2023)
Abdelrahman Shaker
,
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman H. Khan
,
Ming-Hsuan Yang
,
Fahad Shahbaz Khan
SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications.
ICCV
(2023)
Hanoona Abdul Rasheed
,
Muhammad Maaz
,
Muhammad Uzair Khattak
,
Salman H. Khan
,
Fahad Shahbaz Khan
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection.
NeurIPS
(2022)
Muhammad Maaz
,
Abdelrahman Shaker
,
Hisham Cholakkal
,
Salman H. Khan
,
Syed Waqas Zamir
,
Rao Muhammad Anwer
,
Fahad Shahbaz Khan
EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Applications.
ECCV Workshops (7)
(2022)
Hanoona Abdul Rasheed
,
Muhammad Uzair Khattak
,
Muhammad Maaz
,
Salman Khan
,
Fahad Shahbaz Khan
Fine-tuned CLIP Models are Efficient Video Learners.
CoRR
(2022)
Abdelrahman Shaker
,
Muhammad Maaz
,
Hanoona Abdul Rasheed
,
Salman Khan
,
Ming-Hsuan Yang
,
Fahad Shahbaz Khan
UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation.
CoRR
(2022)