Cross Domain Assessment of Document to HTML Conversion Tools to Quantify Text and Structural Loss during Document Analysis.
Kyle GoslinMarkus HofmannPublished in: EISIC (2013)
Keyphrases
- document analysis
- cross domain
- electronic documents
- document images
- document processing
- printed documents
- character recognition
- transfer learning
- image analysis
- text analysis
- text categorization
- sentiment classification
- document image retrieval
- knowledge transfer
- document layout
- word level
- neural network
- web pages
- machine learning
- k nearest neighbor
- handwritten documents
- information extraction
- data mining