Publication: HVM-1: Large-scale video models pretrained with nearly 5000 hours of human-like video data.