Are you talking to ['xem'] or ['x', 'em']? On Tokenization and Addressing Misgendering in LLMs with Pronoun Tokenization Parity.
Anaelia OvalleNinareh MehrabiPalash GoyalJwala DhamalaKai-Wei ChangRichard S. ZemelAram GalstyanRahul GuptaPublished in: CoRR (2023)
Keyphrases
- named entities
- biomedical text
- biomedical information retrieval
- character n grams
- information extraction
- em algorithm
- database
- gaussian mixture model
- expectation maximisation
- noun phrases
- cross language information retrieval
- cross language
- n gram
- expectation maximization
- knowledge discovery
- k means
- image processing