QueryMamba: A Mamba-Based Encoder-Decoder Architecture with a Statistical Verb-Noun Interaction Module for Video Action Forecasting @ Ego4D Long-Term Action Anticipation Challenge 2024.
Zeyun ZhongManuel MartinFrederik DiederichsJuergen BeyererPublished in: CoRR (2024)