Publication: FILS: Self-Supervised Video Feature Prediction In Semantic Language Space.