Login / Signup

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio.

Max BainJaesung HuhTengda HanAndrew Zisserman
Published in: INTERSPEECH (2023)
Keyphrases
  • multimedia
  • visual information
  • machine learning
  • computationally efficient
  • multiscale
  • high accuracy
  • data mining
  • high quality
  • signal processing
  • audio visual
  • cepstral features