Gold Standard Bangla OCR Dataset: An In-Depth Look at Data Preprocessing and Annotation Processes.
Hasmot AliAKM Shahariar Azad RabbyMd. Majedul IslamA. k. m MahamudNazmul HasanFuad RahmanPublished in: EMNLP (Industry Track) (2023)
Keyphrases
- gold standard
- data preprocessing
- mechanical turk
- preprocessing
- semi automatic
- ground truth
- character segmentation
- data mining
- optical character recognition
- preprocessing step
- feature selection
- data cleaning
- web usage mining
- document images
- metadata
- text classification
- data processing
- database
- feature set
- information extraction
- feature extraction
- neural network
- databases