DocTr: Document Transformer for Structured Information Extraction in Documents.
Haofu LiaoAruni RoyChowdhuryWeijian LiAnkan BansalYuting ZhangZhuowen TuRavi Kumar SatzodaR. ManmathaVijay MahadevanPublished in: CoRR (2023)
Keyphrases
- information extraction
- text documents
- unstructured documents
- web documents
- structured data
- information retrieval
- document collections
- semi structured documents
- document processing
- document classification
- free text
- text mining
- relevant documents
- unstructured text
- document clustering
- semi structured
- document representation
- information retrieval systems
- document content
- textual data
- digital documents
- electronic documents
- document ranking
- document analysis
- cross document
- retrieval systems
- natural language processing
- document structure
- named entities
- document similarity
- document retrieval
- index terms
- structured documents
- named entity recognition
- vector space model
- text classification
- multimedia documents
- document type
- document set
- machine learning
- textual content
- natural language text
- text collections
- document repository
- user queries
- training documents
- document summarization
- related documents
- text summarization
- text categorization
- retrieved documents
- keywords
- document images
- language model
- question answering
- document relevance
- xml format
- tf idf
- query terms
- scientific documents
- test collection
- natural language
- search engine
- document level
- retrieval strategies
- topic hierarchy
- structured and unstructured data
- pdf files
- document archives
- digital libraries
- semantic information
- machine translation
- printed documents
- logical structure
- term frequency
- text corpus
- text classifiers