BaDLAD: A Large Multi-Domain Bengali Document Layout Analysis Dataset.
Md. Istiak Hossain ShihabMd Rakibul HasanMahfuzur Rahman EmonSyed Mobassir HossenMd. Nazmuddoha AnsaryIntesur AhmedFazle Rabbi RakibShahriar Elahi DhruvoSouhardya Saha DipAkib Hasan PavelMarsia Haque MeghlaMd. Rezwanul HaqueSayma Sultana ChowdhuryFarig SadequeTahsin ReasatAhmed Imtiaz HumayunAsif Shahriyar SushmitPublished in: CoRR (2023)
Keyphrases
- multi domain
- cross domain
- spoken dialogue systems
- search computing
- domain specific
- document collections
- document retrieval
- document images
- information retrieval
- general purpose
- information retrieval systems
- named entities
- text documents
- document clustering
- automatic summarization
- news corpus
- cross language
- active learning