Visual Segmentation for Information Extraction from Heterogeneous Visually Rich Documents.
Ritesh SarkhelArnab NandiPublished in: SIGMOD Conference (2019)
Keyphrases
- information extraction
- information retrieval
- text documents
- free text
- web documents
- document collections
- segmentation algorithm
- unstructured documents
- natural language text
- level set
- xml documents
- natural language processing
- text mining
- topic segmentation
- image segmentation
- document classification
- metadata
- named entity recognition
- textual data
- information retrieval systems
- medical images
- web mining
- multiscale
- question answering
- unstructured text
- user queries
- machine learning
- document retrieval
- high level
- keywords
- precision and recall
- text lines
- visual representation
- document set
- heterogeneous collections
- document clustering
- image analysis
- edge detection
- retrieval systems
- named entities
- relevant documents
- semantic content
- machine translation
- document analysis
- human observers
- visual information
- structured data
- visual stimuli
- visual features
- perceptual information
- vector space model
- region growing