COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning.

Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox

Published in: CoRR (2020)

Keyphrases

learning algorithm
inductive learning
neural network
information retrieval
active learning
supervised learning
image classification
background knowledge
text representation