A Hybrid Approach for Transliterated Word-Level Language Identification: CRF with Post-Processing Heuristics.
Somnath BanerjeeAlapan KuilaAniruddha RoySudip Kumar NaskarPaolo RossoSivaji BandyopadhyayPublished in: FIRE (2014)
Keyphrases
- post processing
- language identification
- word level
- document images
- conditional random fields
- preprocessing
- document analysis
- machine translation
- language independent
- speaker identification
- information extraction
- optical character recognition
- image processing
- hidden markov models
- probabilistic model
- target language
- non stationary
- machine learning
- word segmentation
- feature selection