Publication: TMT-VIS: Taxonomy-aware Multi-dataset Joint Training for Video Instance Segmentation.