Using Character Ngrams for Word-Level Language Identification in Trilingual Code-Mixed Data (and Even More).
Yves BestgenPublished in: FIRE (Working Notes) (2023)
Keyphrases
- language identification
- word level
- mixed data
- document images
- n gram
- optical character recognition
- text lines
- data compression
- language independent
- data sets
- knn
- document analysis
- machine translation
- similarity function
- language model
- text classification
- word segmentation
- character recognition
- feature space
- speaker identification
- neural network
- keywords
- bayesian networks
- clustering algorithm