Towards Tabular Data Extraction From Richly-Structured Documents Using Supervised and Weakly-Supervised Learning.
Arnab Ghosh ChowdhuryMartin ben AhmedMartin AtzmuellerPublished in: ETFA (2022)
Keyphrases
- data extraction
- structured documents
- weakly supervised learning
- weakly supervised
- multiple instance learning
- semi structured
- information retrieval systems
- web documents
- xml documents
- semi supervised
- data integration
- object detection
- web pages
- object class
- topic models
- information retrieval
- supervised learning
- query language
- query interface
- information extraction
- relevant documents
- learning algorithm
- unsupervised learning
- named entities
- machine learning
- text mining
- feature selection