Information Extraction from Visually Rich Documents with Font Style Embeddings.
Ismail OussaidWilliam VanhuffelPirashanth RatnamoganMhamed HajaiejAlexis MatheyThomas GillesPublished in: CoRR (2021)
Keyphrases
- information extraction
- text documents
- free text
- web documents
- information retrieval
- unstructured documents
- vector space
- natural language text
- text mining
- textual data
- unstructured text
- information retrieval systems
- document classification
- document collections
- natural language processing
- named entities
- named entity recognition
- semi structured
- web mining
- authorship attribution
- retrieval systems
- low dimensional
- document retrieval
- document image understanding
- metadata
- document analysis
- text processing
- relation extraction
- vector space model
- character recognition
- structured data
- document clustering
- user queries
- machine learning
- machine translation
- relevant documents
- euclidean space
- question answering
- dimensionality reduction
- information extraction systems
- wordnet
- manifold learning
- optical character recognition
- principal component analysis