Cost-effective End-to-end Information Extraction for Semi-structured Document Images.
Wonseok HwangHyunji LeeJinyeong YimGeewook KimMinjoon SeoPublished in: CoRR (2021)
Keyphrases
- cost effective
- end to end
- semi structured
- document images
- information extraction
- structured data
- text mining
- document image analysis
- free text
- low cost
- web documents
- data extraction
- document analysis
- natural language processing
- information retrieval
- cost effectiveness
- semi structured data
- optical character recognition
- congestion control
- natural language
- web mining
- web data sources
- wrapper generation
- text processing
- web information extraction
- page layout
- text documents
- page segmentation
- printed documents
- machine learning
- unstructured data
- machine translation
- scanned documents
- document image retrieval
- unstructured text
- data center
- data model
- textual data
- printed text
- wrapper induction
- website
- knowledge base