Publication: Clustering for high dimensional categorical data based on text similarity.