CTC-aligned Audio-Text Embedding for Streaming Open-vocabulary Keyword Spotting.
Sichen JinYoungmoon JungSeungjin LeeJaeyoung RohChangwoo HanHoonyoung ChoPublished in: CoRR (2024)
Keyphrases
- keyword spotting
- printed documents
- speech processing
- keywords
- speech recognition
- handwritten documents
- hidden markov models
- signal processing
- multimedia
- text mining
- english text
- document images
- document analysis
- information retrieval
- media streams
- text retrieval
- language independent
- vector space
- speaker identification
- visual information
- character recognition
- audio visual
- text analysis
- text processing
- artificial intelligence
- natural language generation
- machine learning
- information extraction
- natural language processing
- text documents
- text to speech
- broadcast news
- multimedia systems
- optical character recognition
- text data