Overview of Gathering Information from Unfathomable Web Pages using Language Independent and Dependent Procedures
Pages : 128-133
Download PDF
Abstract
World Wide Web has more and more online Web pages which can be searched through their Web query interfaces. The query results will be retrieved based on the visual information of Web pages such as the information related to Web page layout (location and size) and font and the returned data records are enwrapped in dynamically generated Web pages. Extracting structured data from unfathomable Web pages is a challenging problem due to the underlying intricate structures of such pages. Unfathomable web page means all the content of the Web That Is not direct accessible through hyper links. In particular HTML forms, Web services. Traditional web extractors focus only on the surface web while the unfathomable web page keeps expanding behind the scene. The large number of techniques has been proposed to address this problem, but all of them have inherent limitations because they are Web-page-programming-language dependent. In this paper, we will propose a novel and UWPE BASED (unfathomable web page extractor) approach by using the visual features of the web pages. These visual features are used to construct a visual block tree for extracting data from the unfathomable web pages .our approach is language independent.
Keywords: Unfathomable web data Extraction, vision based approach, language-independent, dependent, search engine
Article published in International Journal of Current Engineering and Technology, Vol.5, No.1 (Feb-2015)