Google Books Ngram: Problems of Representativeness and Data Reliability.
Valery D. SolovyevVladimir V. BochkarevSvetlana S. AkhtyamovaPublished in: DAMDID/RCDL (Selected Papers) (2019)
Keyphrases
- data sets
- data collection
- database
- data analysis
- synthetic data
- data quality
- training data
- n gram
- data mining techniques
- data structure
- image data
- knowledge discovery
- data points
- high quality
- clustering algorithm
- data distribution
- survey data
- data sources
- website
- databases
- small number
- data mining applications
- machine learning
- original data
- raw data
- feature selection
- application domains
- missing data
- xml documents
- high dimensional data
- data processing
- input data
- probability distribution
- language model