Sushant Goel
, Hema Sharda
, and David Taniar
School of Electrical and Computer Engineering
Royal Melbourne Institute of Technology
Australia
Email: s2013070@student.rmit.edu.au
Email: hema.sharda@rmit.edu.au
School of Business Systems
Monash University
Australia
Email: David.Taniar@infotech.monash.edu.au
INTRODUCTION AND BACKGROUND:
Grids are an approach for building dynamically constructed problem solving environments using distributed and federated data handling infrastructure that manages geographically and organizationally dispersed resources. This infrastructure has to handle data intensive, high performance computing applications that has to manage terabyte or petabytes of information. Handling distributed data has many research issues like scheduling of transactions, query execution, maintaining consistency of data etc. Applications that generate high volume of data through experimental analyses and simulations, such as high-energy physics, climate modeling, earthquake engineering, and astronomy, need to be accessed by global research community. These data should then be transferred to local sites for processing. The sites may create copies of the data sets and may keep for future reference to reduce any transfer latency. Thus several replicas of the data-set may be available in the grid. Replica management service is used in computational grids to locate the closest replica in the grid and to access them.
A four level architecture for computational grid has been proposed in the literature [1]: (1) Fabric (2) Connectivity (3) Resource and (4) Collective. Fabric is the lowest level of grid architecture, consists of individual resources like storage systems, networks, and catalogues. Connectivity layer is concerned with communication and authentication. Resource layer provides secure remote access to individual resources finally the collective layer provides the coordinated access to multiple resources. The replica management services are located at collective layer, as they have to coordinate information between sites.
PROPOSAL:
Many of the proposed architectures assume to transfer the data at the local sites at the same time they also assume that the transaction spans to only single site. If the site does not access the data frequently it may not be feasible to have an overhead of maintaining a new replica and propagating the changes to all the replicas. Consistency management architecture of data grids three layers, (1) File transfer layer (2) Replica catalogue and (3) Replica manager. A top layer of consistency services has been proposed recently [3]. In this paper we present a consistency management strategy for those transactions that intend to modify datasets but do not access the datasets often. For such transactions the data replication strategy may be expensive because most of the data replication strategy uses files as the unit of replication. Thus, even if the transaction only accesses few objects from the file, whole of the file has to be replicated.
We address another issue in replica management discussed in [2]. Architecture discussed in [2] does not take into consideration the consistency of the replica. If the data is changed at any of the sites, other replicated copies are unaware of this modification and they still keep the dirty data in the repository and thus the consistency of the data is not maintained. In this paper we propose and incorporate the strategy where the replica management services are responsible to maintain freshness of the data at all the replicated sites. Standard distributed database strategies for replication of data may not be implemented in grid environment because distributed databases operate on dedicated LANs and on comparatively smaller volume of data. Security of data is also one of the major issues in grid environment. Thus there is a need for new replica management strategy in grid environment.
Thus, this paper looks at the consistency problem from two different perspectives: (1) Achieving consistency in grids for transactions that touches more than one sites, (2) Maintaining the freshness of data in replicated sites.
KEYWORDS: Multi-scheduler, Concurrency control, Replication, Recovery and Reliability.
REFERENCE:
[1]I. Foster, C. Kesselman, S. Tuecke, "The Anatomy of the Grid: Enabling Scalable
Virtual Organizations", IJSA 2001.
[2]An Architecture for Replica Management in Grid Computing Environments, A working document of the Global Grid Forum, http://www.globus.org/datagrid/replica-management.html
[3]D. Düllmann, W. Hoschek, J. Jean-Martinez, A. Samar, H. Stockinger, K. Stockinger, "Models for Replica Synchronization and Consistency in a Data Grid" 10th IEEE Symposium on High Performance and Distributed Computing (HPDC-10), 2001.