Record Normalization by Eliminating Duplicate entries from Multiple Sources
Pages : 335-339
Download PDF
Abstract
A bulk data is generated from various sources. The sources may provide duplicate data with some representative changes. To mine such big data and create a representative data is a challenging task. The data importance increases when it is linked with similar resources and similar data is fused in one source. Lot of research work has been done to provide a single representative data of all real world entities by removing the duplicate records. This task is called as record normalization. This technique focuses on precision of record normalization as compared with the existing strategies. For record normalization it uses record level, field level and value level normalization technique. The precision of unique representation of record is increases in each level. Along with unique representation, the data is linked with similar resources by comparing the similar record field values. The system is tested on citation record based dataset and its accuracy and execution time is compared.
Keywords: Record normalization, data clustering, data fusion, data linking, data integration