Login / Signup

An Embarrassingly Simple Method to Mitigate Undesirable Properties of Pretrained Language Model Tokenizers.

Valentin HofmannHinrich SchützeJanet B. Pierrehumbert
Published in: ACL (2) (2022)
Keyphrases
  • language model
  • probabilistic model
  • context sensitive
  • information retrieval
  • clustering method
  • feature selection
  • n gram