*启龙 冯 (中南大学)
Clustering is a fundamental problem and has lots of applications in many fields. The k-means is the most commonly studied one, which aims to identify a set of k cluster centers for given n points, such that the sum of the squared distance between each input point and its nearest center is minimized. In this talk, we consider two variants of k-means clustering, called constrained k-means and k-means with m outliers. We mainly introduce the following results. (1)We analyze the effectiveness of k-means++ on the two problems, which is the first mathematic analysis in this direction. We show that by sampling a set of more than k cluster centers with k-means++, it is possible to yield O(1)-approximations for both the two problems with high constant probability. (2)For the constrained k-means problem, we present a (1+ϵ)-approximation algorithm which runs in O(nd(1891ek/ ϵ2)^(8k/ ϵ)/ ϵ) time. This is the existed fastest (1+ ϵ)-approximation algorithm for the problem. We also consider the constrained 2-means problem and give a (1+ ϵ)-approximation algorithm with linear running time. We show that any existing approximation scheme for constrained k-means with running time C(n, d, k, ϵ) can be transformed to a new algorithm with time complexity C(n, d, k, ϵ)/k^{O(1/ ϵ)}. (3)For the k-means with m outliers problem, we give the first polynomial time approximation algorithm (PTAS) for the case when k is a fixed number, which considerably improves the previous result (with running time exponential depend on both k and m).
Math formula preview: