Publication: Auxiliary audio-textual modalities for better action recognition on vision-specific annotated videos.