Information extraction from scanned documents by stochastic page layout analysis.
Atsuhiro TakasuKenro AiharaPublished in: SAC (2008)
Keyphrases
- scanned documents
- information extraction
- page layout analysis
- document images
- text mining
- optical character recognition
- web documents
- text detection
- natural language processing
- information retrieval
- noise removal
- link analysis
- named entities
- computer vision
- feature set
- knowledge discovery
- text documents
- natural language
- multiscale
- image processing
- web pages
- machine learning