Amalgamated Approach for Devanagari Script Corpus for OCR & Demographic Purpose and XML for Linguistic Annotation.
Maninder Singh NehraNeeta NainMushtaq AhmedPrakash ChoudharyDeepa ModiPublished in: SITIS (2017)
Keyphrases
- character recognition
- hand crafted
- metadata
- linguistic features
- xml documents
- annotated corpus
- optical character recognition
- linguistic information
- xml data
- semantic annotation
- databases
- natural language processing
- reference resolution
- recognition scheme
- natural language
- preprocessing
- active learning
- word recognition
- data model
- data integration
- linguistic patterns
- xml format
- relational databases
- natural language text
- xml databases
- post processing
- automatic annotation
- markup language
- machine vision
- xml schema
- data exchange
- document images
- document processing
- printed documents
- named entities
- database
- text recognition
- phrase structure