The Challenges of Large-Scale, Web-Based Language Datasets: Word Length and Predictability Revisited.
Stephan C. MeylanThomas L. GriffithsPublished in: Cogn. Sci. (2021)
Keyphrases
- real world
- real life
- benchmark datasets
- language learning
- lessons learned
- scientific data analysis
- natural language
- programming language
- uci machine learning repository
- lingua franca
- million images
- key issues
- small scale
- training dataset
- learning platform
- grand challenge
- information systems
- scripting languages
- database
- hashing methods
- training data