Login / Signup
Jamo Pair Encoding: Subcharacter Representation-based Extreme Korean Vocabulary Compression for Efficient Subword Tokenization.
Sangwhan Moon
Naoaki Okazaki
Published in:
LREC (2020)
Keyphrases
</>
encoding scheme
image representation
data compression
bit string
pairwise
probabilistic model
bag of words
named entities
compression ratio
spoken document retrieval
main idea consists