A unified scheme of text localization and structured data extraction for joint OCR and data mining.
Yibin YeShenggao ZhuJing WangQi DuYezhang YangDandan TuLanjun WangJiebo LuoPublished in: IEEE BigData (2018)
Keyphrases
- data extraction
- data mining
- semi structured
- text mining
- text recognition
- web data extraction
- data integration
- html pages
- document processing
- optical character recognition
- printed documents
- real world
- structured data
- information extraction
- web documents
- document analysis
- information retrieval
- ocr systems
- data mining techniques
- website
- web pages
- decision support
- document images
- information integration
- data warehouse
- knowledge discovery
- relational databases
- association rules
- machine learning
- database