Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP.
Sabrina J. MielkeZaid AlyafeaiElizabeth SaleskyColin RaffelManan DeyMatthias GalléArun RajaChenglei SiWilson Y. LeeBenoît SagotSamson TanPublished in: CoRR (2021)