Improving Information Extraction on Business Documents with Specific Pre-Training Tasks.
Thibault DouzonStefan DuffnerChristophe GarciaJérémy EspinasPublished in: CoRR (2023)
Keyphrases
- information extraction
- information retrieval
- free text
- web documents
- text documents
- unstructured documents
- text mining
- domain specific
- document collections
- natural language processing
- electronic commerce
- natural language text
- textual data
- document classification
- data mining
- xml documents
- machine learning
- document clustering
- document retrieval
- relevant documents
- data extraction
- information systems
- decision making
- business process
- business processes
- business intelligence
- named entity recognition
- knowledge discovery
- question answering systems
- unstructured text
- text classification
- textual information
- training process
- metadata
- structured data
- natural language
- training set
- information technology
- co occurrence
- named entities
- transfer learning
- user queries
- question answering