News Updates Monday 25th Nov 2024 :
  • Welcome to INPRESSCO, world's leading publishers, We have served more than 10000+ authors
  • Articles are invited in engineering, science, technology, management, industrial engg, biotechnology etc.
  • Paper submission is open. Submit online or at editor.ijcet@inpressco.com
  • Our journals are indexed in NAAS, University of Regensburg Germany, Google Scholar, Cross Ref etc.
  • DOI is given to all articles

Improving Efficiency of GEO-Distributed Data Sets using Pact


Author : Kirtimalini N. Kakade and T.A.Chavan

Pages : 1284-1287
Download PDF
Abstract

In an Internet era, a report says every day 2.5 quintillion bytes of data is created. This data is obtained from many sources such as sensors to gather climate information, trajectory information, transaction records, web site usage data etc. This data is known as Big data. Hadoop is only scalable that is it can reliably store and process petabytes. Hadoop plays an important role in processing and handling big data It includes MapReduce – offline computing engine, HDFS – Hadoop Distributed file system, HBase – online data access.Map Reduce functions as dividing input files into chunks and processing these in a series of parallelizable steps., mapping and reducing constitute the essential phases for a Map Reduce job. As this freamework provides solution for large data nodes by providing distributed environment. Moving all input data to a single datacenter before processing the data is expensive. Hence we concentrate on geographical distribution of geo-distributed data for sequential execution of map reduce jobs to optimize the execution time. But it is observed from various results that mapping and reducing function is not sufficient for all type of data processing. The fixed execution strategy of map reduce program is not optimal for many task as it does not know about the behavior of the functions. Thus, to overcome these issues, we are enhancing our proposed work with parallelization contracts. These contracts help to capture a reasonable amount of semantics for executing any type of task with reduced time consumption. The parallelization contracts include input and output contract which includes the constraints and functions of data execution The main aim of this paper is to discuss various known Map reduce technology techniques available for geodistributed data sets by using different techniques. Further, the paper also discloses the implementation of these techniques, their advantages, disadvantages, and the results measured. Future trends including use of query optimizing techniques to improve the results of the query as well as reduce the cost for the computation. To achieve this we use the indexing mechanism to the cache system to preserve the query search results.

Keywords: Geodistributed , MaReduce, PACT, big data

Article published in International Journal of Current  Engineering  and Technology, Vol.4,No.3 (June- 2014)

 

 

Call for Papers
  1. IJCET- Current Issue
  2. Issues are published in Feb, April, June, Aug, Oct and Dec
  3. DOI is given to all articles
  • Inpressco Google Scholar
  • Inpressco Science Central
  • Inpressco Global impact factor
  • Inpressco aap

International Press corporation is licensed under a Creative Commons Attribution-Non Commercial NoDerivs 3.0 Unported License
©2010-2023 INPRESSCO® All Rights Reserved