Login / Signup
Tokenization and the Noiseless Channel.
Vilém Zouhar
Clara Meister
Juan Luis Gastaldi
Li Du
Mrinmaya Sachan
Ryan Cotterell
Published in:
CoRR (2023)
Keyphrases
</>
multi channel
noisy data
noisy images
communication channels
channel coding
biomedical text
biomedical information retrieval
n gram
multiple access
real time
genetic algorithm
case study
named entities
character n grams
channel capacity