An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling.
Tsu-Jui FuLinjie LiZhe GanKevin LinWilliam Yang WangLijuan WangZicheng LiuPublished in: CoRR (2022)
Keyphrases
- end to end
- scalable video
- video sequences
- video data
- wireless ad hoc networks
- ad hoc networks
- video content
- multimedia
- high bandwidth
- rate allocation
- video streams
- multipath
- admission control
- congestion control
- video frames
- application layer
- visual information
- real time
- cross layer
- text localization and recognition
- computer networks
- video compression
- image coding