Web document text and images extraction using DOM analysis and natural language processing.
Parag Mulendra JoshiSam LiuPublished in: ACM Symposium on Document Engineering (2009)
Keyphrases
- web documents
- information extraction
- natural language processing
- textual information
- image analysis
- text extraction
- web pages
- input image
- image database
- text mining
- computational linguistics
- text processing
- image retrieval
- website
- free text
- image registration
- image data
- three dimensional
- automatically extracted
- edge detection
- named entity recognition
- line extraction
- natural language
- data analysis
- textual data
- automatic extraction
- semantic analysis
- image features
- image collections
- image annotation
- web content
- semi structured
- visual information
- image classification