Cost-effective End-to-end Information Extraction for Semi-structured Document Images.
Wonseok HwangHyunji LeeJinyeong YimGeewook KimMinjoon SeoPublished in: EMNLP (1) (2021)
Keyphrases
- semi structured
- end to end
- cost effective
- document images
- information extraction
- structured data
- data extraction
- free text
- low cost
- natural language processing
- text mining
- document image analysis
- cost effectiveness
- web documents
- information retrieval
- unstructured data
- document analysis
- wrapper generation
- congestion control
- semi structured data
- machine learning
- web data sources
- unstructured text
- textual data
- data model
- page segmentation
- optical character recognition
- text documents
- natural language
- word spotting
- text processing
- web mining
- scanned documents
- web information extraction
- website
- data center
- active learning
- printed text
- real time