LMDX: Language Model-based Document Information Extraction and Localization.
Vincent PerotKai KangFlorian LuisierGuolong SuXiaoyu SunRamya Sree BoppanaZilong WangJiaqi MuHao ZhangNan HuaPublished in: CoRR (2023)
Keyphrases
- information extraction
- web documents
- information retrieval
- text documents
- unstructured documents
- natural language
- text summarization
- programming language
- natural language processing
- intended meaning
- precision and recall
- text mining
- document classification
- document collections
- question answering
- vector space model
- machine learning
- named entity recognition
- extensible markup language
- word sense disambiguation
- document clustering
- semi structured
- document retrieval
- web mining
- document images
- information retrieval systems
- named entities
- text classification
- co occurrence
- document representation
- knowledge discovery
- structured documents
- logical structure
- data model
- localization error
- web pages
- data mining
- multilingual documents