Bridging the Gap between Subword and Character Segmentation in Pretrained Language Models.
Shun KiyonoSho TakaseShengzhe LiToshinori SatoPublished in: RANLP (2023)
Keyphrases
- language model
- character segmentation
- n gram
- character recognition
- speech recognition
- out of vocabulary
- language modeling
- automatic recognition
- gray scale images
- optical character recognition
- hand written
- probabilistic model
- information retrieval
- document retrieval
- test collection
- printed documents
- language modelling
- language independent
- retrieval model
- license plate
- gray scale
- query expansion
- chinese characters
- machine vision
- word segmentation
- query terms
- relevance model
- smoothing methods
- image processing
- handwriting recognition
- document images
- image retrieval