Lexical Acquisition from Audio-Visual Streams Using a Multimodal Recurrent State-Space Model.
Soichiro KomuraKatsuyoshi MaeyamaAkira TaniguchiTadahiro TaniguchiPublished in: ICDL (2023)
Keyphrases
- state space model
- audio visual
- multi stream
- multi modal
- audio visual speech recognition
- state estimation
- kalman filter
- visual information
- autoregressive
- multimodal fusion
- visual data
- multimedia
- wordnet
- natural language processing
- recurrent neural networks
- hidden markov models
- human body
- kalman filtering
- video sequences
- machine learning