VRDSynth: Synthesizing Programs for Multilingual Visually Rich Document Information Extraction.
Thanh-Dat NguyenTung Do-VietHung Nguyen-DuyTuan-Hai LuuHung LeBach LePatanamon ThongtanunamPublished in: CoRR (2024)
Keyphrases
- information extraction
- text documents
- web documents
- information retrieval
- unstructured documents
- natural language processing
- text summarization
- multilingual information retrieval
- text mining
- information retrieval systems
- question answering
- free text
- document clustering
- semi structured
- machine learning
- named entity recognition
- retrieval systems
- document images
- document collections
- machine translation
- precision and recall
- document classification
- document retrieval
- multilingual documents
- structured data
- named entities
- conditional random fields
- real world
- structured documents
- cross language ir
- language resources
- document analysis
- relation extraction
- textual data
- language independent
- cross language
- cross lingual