Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering.
J. Edward HuAbhinav SinghNils HolzenbergerMatt PostBenjamin Van DurmePublished in: CoNLL (2019)
Keyphrases
- clustering method
- clustering algorithm
- k means
- unsupervised learning
- real world
- categorical data
- data clustering
- real life
- outlier detection
- monte carlo
- self organizing maps
- wide variety
- hierarchical clustering
- graph theoretic
- cluster analysis
- information theoretic
- sampling strategy
- sample size
- data streams
- database
- probability distribution
- document clustering
- high dimensional
- spectral clustering
- fuzzy clustering
- social networks
- data sets
- web scale
- real time