A tool-supported method to extract data and schema from web sites.
Fabrice EstievenartAurore FrançoisJean HenrardJean-Luc HainautPublished in: WSE (2003)
Keyphrases
- synthetic data
- test data
- data sets
- input data
- prior knowledge
- image data
- databases
- similarity measure
- correlation analysis
- database
- noisy data
- prior information
- data collection
- information loss
- pairwise
- raw data
- statistical methods
- significant improvement
- website
- preprocessing
- missing values
- detection method
- training data
- data mining
- xml documents
- data mining tools
- learning algorithm
- spatial data
- objective function
- training samples
- data structure
- data mining techniques
- support vector machine
- probabilistic model
- data sources
- data model
- k means