MataDoc: Margin and Text Aware Document Dewarping for Arbitrary Boundary.
Beiya DaiXing liQunyi XieYulin LiXiameng QinChengquan ZhangKun YaoJunyu HanPublished in: CoRR (2023)
Keyphrases
- document images
- document analysis
- printed documents
- document processing
- text documents
- text lines
- digital documents
- keywords
- information retrieval
- text content
- document content
- scanned documents
- web documents
- text mining
- textual content
- textual documents
- text clustering
- multimedia documents
- printed text
- text recognition
- text corpus
- optical character recognition
- electronic documents
- text collections
- scientific papers
- handwritten documents
- text classifiers
- retrieval engine
- support vector
- text summarization
- scientific documents
- automatic text summarization
- document categorization
- digital camera
- document corpus
- structured documents
- latent semantic analysis
- tf idf
- extractive summarization
- page layout analysis
- pdf files
- authorship attribution
- technical papers
- keyword extraction
- noun phrases
- textual data
- document classification
- document retrieval
- database
- document structure
- text categorization
- free text
- topic models