Automatic extraction of non-textual information in web document and their classification.
Martina ZachariasovaRóbert HudecMiroslav BencoPatrik KamencayPublished in: TSP (2012)
Keyphrases
- textual information
- automatic extraction
- web documents
- keywords
- visual content
- visual information
- financial reports
- web pages
- semi structured
- information extraction
- low level features
- contextual information
- text documents
- feature extraction
- feature space
- unsupervised learning
- machine learning
- high level
- n gram
- text classification
- image classification
- text mining
- supervised learning
- feature selection