ShabbyPages: A Reproducible Document Denoising and Binarization Dataset.
Alexander GroleauKok Wei CheeStefan LarsonSamay MainiJonathan BoarmanPublished in: CoRR (2023)
Keyphrases
- denoising
- document images
- image denoising
- image processing
- noisy images
- information retrieval
- total variation
- noise removal
- database
- information retrieval systems
- denoising algorithm
- wavelet domain
- benchmark datasets
- document processing
- retrieval systems
- gaussian noise
- web documents
- keywords
- preprocessing
- search engine
- tf idf
- denoising methods
- document analysis
- image preprocessing
- grayscale images
- wavelet packet
- document clustering
- document retrieval
- user queries
- test collection
- document collections