A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics.
Martin GerlachFrancesc Font-ClosPublished in: CoRR (2018)
Keyphrases
- natural language
- statistical analysis
- natural language processing
- natural language text
- clinical data
- question answering
- case study
- project management
- statistical analyses
- natural language generation
- semantic interpretation
- machine learning
- open domain
- information extraction
- statistical methods
- natural language understanding
- qualitative and quantitative
- knowledge representation
- computer science
- natural language interface
- syntactic structures
- current status
- semantic representation
- semantic analysis
- data collection
- quantitative and qualitative
- language learning
- natural language sentences
- data sets
- manually annotated
- noun phrases
- machine translation
- software development
- data analysis
- neural network