AMALGUM - A Free, Balanced, Multilayer English Web Corpus.
Luke GesslerSiyao PengYang LiuYilun ZhuShabnam BehzadAmir ZeldesPublished in: CoRR (2020)
Keyphrases
- website
- link grammar
- web applications
- english language
- chinese web
- open domain
- natural language
- broad coverage
- statistical machine translation
- linguistic features
- web documents
- web pages
- person names
- web resources
- web mining
- web data
- end users
- wide coverage
- information access
- language learning
- cross lingual
- user generated content
- web scale
- manually annotated
- multiword
- web content
- language model
- textual features
- machine translation
- information sources