CORU: Comprehensive Post-OCR Parsing and Receipt Understanding Dataset.
Abdelrahman AbdallahMahmoud AbdallaMahmoud SalahEldin KasemMohamed MahmoudIbrahim AbdelhalimMohamed ElkasabyYasser ElbendaryAdam JatowtPublished in: CoRR (2024)
Keyphrases
- natural language processing
- post processing
- benchmark datasets
- error correction
- preprocessing
- document images
- dependency parsing
- feature set
- language understanding
- optical character recognition
- speech understanding
- real world
- linguistic analysis
- context free grammars
- training dataset
- synthetic datasets
- character recognition
- information retrieval