Inadequacy of the chi-squared test to examine vocabulary differences between corpora.
Yves BestgenPublished in: Lit. Linguistic Comput. (2014)
Keyphrases
- chi squared
- information gain
- statistically significant
- natural language processing
- mutual information
- data mining
- pattern recognition
- artificial neural networks
- image processing
- feature extraction
- keywords
- data sets
- simulated annealing
- test cases
- learning algorithm
- test data
- information retrieval
- statistical machine translation