GUMBY - A Free, Balanced, and Rich English Web Corpus.
Luke GesslerSiyao PengYang LiuYilun ZhuShabnam BehzadAmir ZeldesPublished in: LREC (2020)
Keyphrases
- broad coverage
- website
- chinese web
- web applications
- link grammar
- person names
- specific domains
- internet usage
- english language
- training corpus
- language learning
- statistical machine translation
- semantic web
- web documents
- wide coverage
- natural language
- web pages
- multiword
- manually annotated
- web technologies
- web resources
- information sources
- newspaper articles
- parallel corpus
- web data
- penn treebank
- web mining