New Classification Method Based on Decision Tree for Web Spam Detection
Pages : 1826-1830
Download PDF
Abstract
Web spam is a serious problem for search engine spiders because the qualities of results are severely degraded by the presence of this kind of page. Web spamming refers to hosting ranking algorithm for giving some pages higher ranking than the others to divert the user. Now a day, waste increase in amount of spam, degrades search engine results. To get over of this some proper classification methods and algorithms are needed. For finding the mine rule from the large database Classification is most common method used. For classification various data mining algorithms available from that entire decision tree mining is simplest one, because it’s having simple hierarchical structure for the user understanding and decision makes process. We are using C5.0 as modified decisions tree algorithm of C4.5. Some rules are derived by applying boosting decision tree algorithm such as C5.0 on datasets and these rules are used for creation of Decision tree, which helps in improving the accuracy. The data from dataset is preprocced and stored into matrix form. The resultant system that significantly improves the detection of Web spam using C5.0 algorithm on public datasets WEBSPAM-UK2006 and WEBSPAM-UK2007. This system can also be used in improving the accuracy.
Keywords: Classification, Classifiers, Data mining, Web spam detection, Decision tree.