Publication: Rethinking Multi-Modal Alignment in Video Question Answering from Feature and Sample Perspectives.