Sign in

Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model.

Hung-Chieh FangNai-Xuan YeYi-Jen ShihPuyuan PengHsuan-Fu WangLayne BerryHung-yi LeeDavid Harwath
Published in: CoRR (2024)
Keyphrases
  • context dependent
  • information retrieval
  • statistical model