P000081

*大川 徐 (北京工业大学) xudc@bjut.edu.cn

K-means problem is a classic NP-hard problem in machine learning and computational geometry, and its seeding algorithms based on the Lloyd's algorithm are hotly studied. In order to cluster the textual data with high dimension in modern data analysis, the spherical K-means clustering is presented. It aims to partition the given points with unit length into K sets so as to minimize the within-cluster sum of cosine dissimilarity. In this talk, we mainly study the seeding algorithm for the spherical K-means clustering, and its special case (with separable sets) and generalized problem ($\alpha$-spherical K-means clustering). For the spherical K-means clustering with separable sets, the approximate algorithm with a constant factor is presented. Moreover, it can be generalized to the $\alpha$-spherical separable K- means clustering. By slickly constructing a useful function, we also show that the famous seeding algorithms such as K-means++ and K-means$||$ of K-means problem can be applied directly to solve the $\alpha$-spherical K-means clustering.

Math formula preview: