An Approach towards Record Linkage using Genetic Algorithm along with Hash Algorithm
Pages : 2142-2146
Download PDF
Abstract
Several systems that depends on the integrity of the data in order to offer high quality services, such as digital libraries and e-commerce brokers, may be affected due to the existence of duplicates in their warehouse. Due to this, more time is required to retrieve high quality data. Here deduplication or record linkage is computed by using hash algorithm i.e., MD5 and SHA-1 algorithm for finding similarity to detect duplicate records and eliminate them using evolutionary i.e., genetic algorithm. This approach removes the duplicate dataset samples in the system.
Keywords: Cosine similarity, Dataset, genetic algorithm, MD5, SHA-1 and string distance.
Article published in International Journal of Current Engineering and Technology, Vol.4,No.3 (June- 2014)