Login / Signup
Masked Audio Text Encoders are Effective Multi-Modal Rescorers.
Jinglun Cai
Monica Sunkara
Xilai Li
Anshu Bhatia
Xiao Pan
Sravan Bodapati
Published in:
CoRR (2023)
Keyphrases
</>
multi modal
audio visual
cross modal
video search
multi modality
information retrieval
single modality
multimedia
image processing
uni modal
multiple modalities
semantic concepts
image annotation
humanoid robot
text retrieval
high dimensional
keywords
text graphics
feature extraction