DocTr: Document Transformer for Structured Information Extraction in Documents.
Haofu LiaoAruni RoyChowdhuryWeijian LiAnkan BansalYuting ZhangZhuowen TuRavi Kumar SatzodaR. ManmathaVijay MahadevanPublished in: ICCV (2023)
Keyphrases
- information extraction
- text documents
- web documents
- unstructured documents
- structured data
- information retrieval
- document processing
- document classification
- text mining
- document collections
- free text
- unstructured text
- document clustering
- semi structured documents
- relevant documents
- semi structured
- cross document
- information retrieval systems
- electronic documents
- document representation
- retrieval systems
- natural language processing
- document content
- textual data
- digital documents
- structured documents
- vector space model
- document similarity
- document analysis
- document archives
- natural language text
- keywords
- document retrieval
- textual content
- question answering
- machine learning
- document set
- digital libraries
- document ranking
- document space
- document type
- document images
- user queries
- named entities
- named entity recognition
- query terms
- test collection
- semantic information
- document relevance
- text collections
- printed documents
- retrieved documents
- document level
- text summarization
- multimedia documents
- similar documents
- topic hierarchy
- document repository
- term frequency
- machine translation
- topic models
- query expansion
- scanned documents
- document structure
- document summarization
- tf idf
- metadata
- text classifiers
- xml documents
- logical structure
- pdf files
- web pages
- text classification