MOS2019:P000236

中国运筹学会第十二届全国数学优化会议暨第六次数学规划分会代表大会

2019年 4月19日 ~ 22日

P000236

New Results for k-Means and Related Problems

*启龙冯 (中南大学)


Clustering is a fundamental problem and has lots of applications in many fields. The k-means is the most commonly studied one, which aims to identify a set of k cluster centers for given n points, such that the sum of the squared distance between each input point and its nearest center is minimized. In this talk, we consider two variants of k-means clustering, called constrained k-means and k-means with m outliers. We mainly introduce the following results.
(1)We analyze the effectiveness of k-means++ on the two problems, which is the first mathematic analysis in this direction. We show that by sampling a set of more than k cluster centers with k-means++, it is possible to yield O(1)-approximations for both the two problems with high constant probability.
(2)For the constrained k-means problem, we present a (1+ϵ)-approximation algorithm which runs in O(nd(1891ek/ ϵ2)^(8k/ ϵ)/ ϵ) time. This is the existed fastest (1+ ϵ)-approximation algorithm for the problem. We also consider the constrained 2-means problem and give a (1+ ϵ)-approximation algorithm with linear running time. We show that any existing approximation scheme for constrained k-means with running time C(n, d, k, ϵ) can be transformed to a new algorithm with time complexity C(n, d, k, ϵ)/k^{O(1/ ϵ)}.
(3)For the k-means with m outliers problem, we give the first polynomial time approximation algorithm (PTAS) for the case when k is a fixed number, which considerably improves the previous result (with running time exponential depend on both k and m).

Clustering is a fundamental problem and has lots of applications in many fields. The k-means is the most commonly studied one, which aims to identify a set of k cluster centers for given n points, such that the sum of the squared distance between each input point and its nearest center is minimized. In this talk, we consider two variants of k-means clustering, called constrained k-means and k-means with m outliers. We mainly introduce the following results. (1)We analyze the effectiveness of k-means++ on the two problems, which is the first mathematic analysis in this direction. We show that by sampling a set of more than k cluster centers with k-means++, it is possible to yield O(1)-approximations for both the two problems with high constant probability. (2)For the constrained k-means problem, we present a (1+ϵ)-approximation algorithm which runs in O(nd(1891ek/ ϵ2)^(8k/ ϵ)/ ϵ) time. This is the existed fastest (1+ ϵ)-approximation algorithm for the problem. We also consider the constrained 2-means problem and give a (1+ ϵ)-approximation algorithm with linear running time. We show that any existing approximation scheme for constrained k-means with running time C(n, d, k, ϵ) can be transformed to a new algorithm with time complexity C(n, d, k, ϵ)/k^{O(1/ ϵ)}. (3)For the k-means with m outliers problem, we give the first polynomial time approximation algorithm (PTAS) for the case when k is a fixed number, which considerably improves the previous result (with running time exponential depend on both k and m).

Supported by SmartChair

Math formula preview: