Sign in

SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition.

Hao WangShuhei KuritaShuichiro ShimizuDaisuke Kawahara
Published in: CoRR (2024)
Keyphrases
  • audio visual speech recognition
  • multi stream
  • human actions
  • video sequences
  • audio visual
  • video frames
  • feature set
  • multi modal
  • image content
  • human activities
  • video content
  • visual data