Document Clustering on Large Scale Data using Ultra Scalable Spectral and Ensemble Clustering
Pages : 203-207
Download PDF
Abstract
Every day the mass of information available, merely finding the relevant information is not the only task of automatic data clustering systems. Instead the automatic data clustering systems are supposed to retrieve the relevant information as well as organize according to its degree of relevancy with the given query. The main problem in organizing is to classify which documents are relevant and which are irrelevant. The Automated data clustering consists of automatically organizing clustered data. Propose a two novel algorithms of data clustering using ultra-scalable spectral clustering (U-SPEC) and ultra-scalable ensemble clustering (U-SENC) based on the disambiguation of the meaning of the word we use the word net to eliminate the ambiguity of words so that each word is replaced by its meaning in context. The closest ancestors of the senses of all the undamaged words in a given document are selected as classes for the specified document.
Keywords: Data clustering, Large-scale clustering, Spectral clustering, Ensemble clustering, Large-scale datasets.