Optimizing K-Means by Fixing Initial Cluster Centers
Pages : 2101-2107, DOI:http://Dx.Doi.Org/10.14741/Ijcet/2014.4.3.176
Download PDF
Abstract
Data mining techniques help in business decision making and predicting behaviors and future trends. Clustering is a data mining technique used to make groups of objects that are somehow similar in characteristics. Clustering analyzes data objects without consulting a known class label or category i.e. it is an unsupervised data mining technique. K-means is a widely used partitional clustering algorithm but the performance of K-means strongly depends on the initial guess of centers (centroid) and the final cluster centroids may not be the optimal ones. Therefore it is important for K-means to have good choice of initial centroids. By augmenting K-means with a technique of selecting centroids using criteria of sum of distances of data objects to all other data objects, we obtain an algorithm Farthest Distributed Centroids Clustering (FDCC) that result in better clustering as compared to not only the K-means partition clustering algorithm but also to the agglomerative hierarchical clustering algorithm and Hierarchical partitioning clustering algorithm. Unlike K-means FDCC algorithm does not perform random generation of the initial centers and does not produce different results for the same input data.
Keywords: Initial centroids; Recall; Precision; Partitional clustering; Agglomerative hierarchical clustering and Hierarchical partitioning clustering.
Article published in International Journal of Current Engineering and Technology, Vol.4,No.3 (June- 2014)