Publication: Towards Lossless Head Pruning through Automatic Peer Distillation for Language Models.