Publication: A standardized Project Gutenberg corpus for statistical analysis of natural language and quantitative linguistics.