Trained on 100 million words and still in shape: BERT meets British National Corpus.
David SamuelAndrey KutuzovLilja ØvrelidErik VelldalPublished in: CoRR (2023)
Keyphrases
- english words
- word frequencies
- multiword
- word pairs
- text corpora
- text corpus
- training corpus
- natural language text
- unknown words
- shape analysis
- shape representation
- shape descriptors
- shape model
- linguistic information
- person names
- lexical features
- united states
- world knowledge
- noun phrases
- n gram
- news corpus
- lda model
- word co occurrence
- neural network
- word frequency
- semantic roles
- document level
- word recognition
- shape features
- document collections
- training set
- keywords