Truly Multi-modal YouTube-8M Video Classification with Video, Audio, and Text.
Zhe WangKingsley KuanMathieu RavautGaurav ManekSibo SongFang YuanKim SeokhwanNancy F. ChenLuis Fernando D'HaroAnh Tuan LuuHongyuan ZhuZeng ZengNgai-Man CheungGeorgios PiliourasJie LinVijay ChandrasekharPublished in: CoRR (2017)
Keyphrases
- multi modal
- video search
- web videos
- video classification
- audio visual
- cross modal
- video dataset
- video shots
- video content
- single modality
- semantic concepts
- multiple modalities
- video clips
- video data
- multimedia
- high dimensional
- audio features
- video sequences
- video indexing
- multi modality
- information retrieval
- video retrieval
- video segments
- high level
- image sequences
- video frames
- motion trajectories
- action recognition
- visual information
- human actions