Multimodal grid features and cell pointers for scene text visual question answering.
Lluís GómezAli Furkan BitenRubèn Pérez TitoAndrés MaflaMarçal RusiñolErnest ValvenyDimosthenis KaratzasPublished in: Pattern Recognit. Lett. (2021)
Keyphrases
- question answering
- low level
- information retrieval
- natural language processing
- natural language
- semantic roles
- feature set
- information extraction
- named entities
- feature selection
- feature extraction
- question answering systems
- feature vectors
- scene text
- visual features
- multi modal
- co occurrence
- image features
- image retrieval
- feature space
- high level