CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing.

Published in: CoRR (2024)

Keyphrases