Publication: Multi-modal Voice Activity Detection by Embedding Image Features into Speech Signal.