Hybridized Character-Word Embedding for Korean Traditional Document Translation.
Hosang YuGil-Jin JangMinho LeePublished in: ICONIP (3) (2018)
Keyphrases
- machine translation system
- printed documents
- word level
- source language
- document images
- statistical machine translation
- translation model
- machine translation
- latent topics
- printed text
- target language
- keywords
- training corpus
- english words
- document analysis
- text lines
- character recognition
- optical character recognition
- retrieval systems
- word sense disambiguation
- web documents
- morphological analysis
- text corpus
- query words
- cross language information retrieval
- term frequency
- short list
- language model
- compound words
- handwritten words
- word order
- target word
- spoken document retrieval
- information retrieval
- cross lingual
- document retrieval
- word sense
- document space
- noun phrases
- information retrieval systems
- text input
- parallel corpus
- vector space
- text documents
- term weighting
- word clouds
- syntactic analysis
- relevant documents
- handwritten documents
- bilingual dictionaries
- vector space model
- document representation
- language independent
- query translation
- cursive handwriting
- out of vocabulary
- related documents
- word recognition
- word segmentation
- n gram
- topic models
- document collections
- co occurrence
- information extraction
- search engine