Clustering on Uncertain Data using Kullback Leibler Divergence Measurement based on Probability Distribution
Pages : 2614-2620
Download PDF
Abstract
Cluster analysis is one of the important data analysis methods and is a very complex task. It is the art of a detecting group of similar objects in large data sets without requiring specified groups by means of explicit features or knowledge of data. Clustering on uncertain data is a most difficult task in both modeling similarity between uncertain data objects and developing efficient computational method. The most of the previous method for a clustering uncertain data extends partitioning clustering algorithms and Density based clustering algorithms. These methods are based on geometrical distance between two uncertain data objects. Such method not capable to handle uncertain objects, which are cannot distinguishable by using geometric characteristics and Distribution related to object itself is not considered. Probability distribution is a most important characteristic of uncertain object is not taking into account during measuring the similarity between two uncertain objects. The very popular technique Kullback-Leibler divergence used to measures the distribution similarity between two uncertain data objects. Integrates the effectiveness of KL divergence into both partition and density based clustering algorithms to properly cluster uncertain data. Calculation of KL-Divergence is very costly to solve this problem by using popular technique kernel density estimation and employ the fast Gauss transform method to further speed up the computation to decrease execution time
Keywords: Uncertain data, Clustering, Fast-Gauss transformation, probabilistic distribution, KL-divergence.
Article published in International Journal of Current Engineering and Technology, Vol.5, No.4 (Aug-2015)