Frame2: A FrameNet-based Multimodal Dataset for Tackling Text-image Interactions in Video.
Frederico BelcavelloTiago Timponi TorrentEly Edison MatosAdriana S. PaganoMaucha GamonalNatália SigilianoLívia Vicente DutraHelen de Andrade AbreuMairon SamagaioMariane CarvalhoFranciany CamposGabrielly AzalimBruna MazzeiMateus Fonseca de OliveiraAna Carolina Loçasso LuzLívia Pádua RuizJúlia BelleiAmanda PestanaJosiane CostaIasmin RabeloAnna Beatriz SilvaRaquel RozaMariana Souza MotaIgor OliveiraPublished in: LREC/COLING (2024)
Keyphrases
- image frames
- input image
- key frames
- video frames
- weakly labeled
- image data
- single frame
- multiscale
- image features
- temporal continuity
- successive frames
- input video
- image classification
- image content
- image segmentation
- video signals
- textual descriptions
- multimedia
- visual data
- adjacent frames
- image representation
- multiple modalities
- street view
- neighboring frames
- video streams
- text mining
- text detection
- image retrieval
- video search
- semantic analysis
- reference frame
- image sequences
- multi modal
- video sequences
- news video
- video data
- visual features
- semantic labels
- natural language processing
- natural scene images
- information retrieval