Classifying Web Spam Using Block-based TrustRank
Pages : 369-373
Download PDF
Abstract
Web spamming refers to actions intended to mislead search engines into ranking some pages higher than they deserve. TrustRank is a recent algorithm that can combat web spam. However, the seed set used by TrustRank may not be sufficiently representative to cover well the different topics on the Web. In this paper, We propose the use of Combined page segmentation for selecting seed set in TrustRank algorithm and uses Block-level retrieval to rank the seed pages so that we can use highly multiple–topic ranked pages as seed set. Experimental results show that our approach deals effectively with the problem of multiple drifting topics and identify highly desirable pages for seed set and thus improve the performance of TrustRank.
Keywords: spam, page segmentation, TrustRank.
Article published in International Journal of Current Engineering and Technology, Vol.2,No.4 (Dec- 2012)