Separation of text and background regions for high performance document image compression.
Wei FanJun SunSatoshi NaoiPublished in: DRR (2015)
Keyphrases
- image compression
- text documents
- web documents
- document analysis
- digital documents
- information retrieval
- document processing
- document content
- textual content
- text collections
- keywords
- text mining
- text clustering
- multimedia documents
- document structure
- text content
- scientific papers
- vector quantization
- document images
- wavelet transform
- compression scheme
- document classification
- document set
- structured documents
- scientific documents
- printed documents
- semantic information
- text summarization
- textual documents
- information retrieval systems
- automatic text summarization
- latent semantic analysis
- document categorization
- related documents
- database
- retrieval engine
- page layout analysis
- text classifiers
- document level
- compression ratio
- keyword extraction
- scanned documents
- document corpus
- text retrieval
- pdf files
- technical papers
- textual data
- content and structure
- text corpus
- document clustering
- information extraction
- handwritten text
- document collections
- web pages
- document retrieval
- free text
- tf idf
- bag of words
- text categorization
- text representation
- electronic documents
- text lines