DocumentCLIP: Linking Figures and Main Body Text in Reflowed Documents.
Fuxiao LiuHao TanChris TensmeyerPublished in: CoRR (2023)
Keyphrases
- text documents
- information retrieval
- free text
- keywords
- document analysis
- digital documents
- web documents
- text retrieval
- text collections
- text mining
- newspaper articles
- text analysis
- textual content
- document collections
- document content
- multimedia documents
- plagiarism detection
- textual information
- textual data
- textual documents
- electronic documents
- latent semantic analysis
- document categorization
- text data
- text clustering
- document processing
- text information
- historical documents
- text content
- related documents
- natural language text
- relevant documents
- semantic information
- information retrieval systems
- linguistic analysis
- text categorization
- topic segmentation
- printed documents
- multiword
- text classifiers
- xml documents
- natural language processing
- document retrieval
- retrieval engine
- journal articles
- text classification
- document structure
- entity extraction
- text corpora
- key concepts
- automatic categorization
- page layout
- scientific documents
- spoken documents
- text corpus
- scientific literature
- word pairs
- information extraction
- retrieval systems
- human body