bbOCR: An Open-source Multi-domain OCR Pipeline for Bengali Documents.
Imam Mohammad ZulkarnainShayekh Bin IslamMd. Zami Al Zunaed FarabeMd. Mehedi Hasan ShawonJawaril Munshad AbedinBeig Rajibul HasanMarsia Haque MeghlaMd. Istiak Hossain ShihabSyed Mobassir HossenMd. Nazmuddoha AnsaryAsif Shahriyar SushmitFarig SadequePublished in: CoRR (2023)
Keyphrases
- multi domain
- open source
- document processing
- printed documents
- scanned documents
- cross domain
- document images
- document analysis
- indian languages
- page layout
- information retrieval
- search computing
- document collections
- domain specific
- optical character recognition
- scanned images
- news corpus
- heterogeneous networks
- character recognition
- information retrieval systems
- document clustering
- document retrieval
- text lines
- text documents
- role based access control
- news articles
- named entities
- case study
- named entity recognition
- general purpose