Publication: Enhancing Video Transformers for Action Understanding with VLM-aided Training.