PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: 15 February 2004

Efficient Execution of Scientific Computation on Long-distance Geographically Distributed Clusters

Eduardo Argollo, Dolores Rexachs, and Emilio Luque
Computer Science Department,
Universitat Autònoma de Barcelona
08193 Bellaterra, Spain
emails: eargollo@aows10.uab,es;{dolores.rexachs, emilio.luque}@uab.es

To achieve data intensive computation one of the approaches that are becoming widespread is the use of dedicated Heterogeneous Networks of Workstations (HNOW) with standard software and libraries. Internet, through its evolution on the past decades, has become a real alternative to interconnect geographically distributed HNOW, although obtaining effective collaboration in such kind of systems is not trivial.

The Matrix Multiplication (MM) algorithm is the target application. It is a simple but important Linear Algebra Kernel [3] used by a wide range of scientific applications and its execution over different-speed processors, in Heterogeneous Network of Workstations (HNOW), turns out to be surprisingly difficult according to the LIP group [1]. Actually, the LIP group proved its NP-completeness. The MM is an O(n3) algorithm, highly scalable, and its workload can be easily managed.

This paper describes the results of the efforts to obtain an effective collaboration over the execution of the MM algorithm on long-distance geographically distributed clusters (collection of HNOW CoHNOWs) interconnected by standard Internet links. To do such work a system architecture and a system model and application tuning methodology were reached.

Therefore, on the way to achieve an efficient level of performance collaboration between the clusters it is essential the use of efficient policies related to workload distribution that could permit the best use of the resources. This necessity is increased when standard Internet is used since its latency and throughput are unpredictable and since the chosen execution application has high data communication needs.

The system architecture evolved in the way that the CoHNOW has been organized as a hierarchical Master/Worker based collection of HNOWs with MPI for intra-communications. To optimize the inter-cluster communication performance, guarantee transparency and isolate and solve Internet connections fails the proposal is dedicating a workstation per HNOW [4] to the long-distance communication task through a special developed software (Communication Manager). Pipelines strategies are implemented in each communication level in order to overlap communication and computation.

Some of the characteristics of the environment workers performance and intra and inter networks throughput and of the algorithm workload size and its block management [2] were examined to reach an analytical system model that allows to predict the execution performance and its behavior along the time. The proposed methodology describes how to tune the application, by the adjusts of some of the parameters, and led us to find the best strategies to reach the highest performance on the MM execution.

Experiments were done in order to validate our work over two clusters geographically separate: one located in Brazil and the other located in Spain. Each cluster is a dedicated HNOW and they are interconnected by standard Internet connections.

The experiments show that the expected temporal behavior is achieved and the difference between predicted and obtained performance is less than 10%. Tuning the application through the proposed methodology provide for the CoHNOW 90% of the maximum achievable performance for the selected parameters values.

References:
[1]Beaumont O., F. Rastello and Y. Robert. Matrix Multiplication on Heterogeneous Platforms, IEEE Trans. On Parallel and Distributed Systems, vol. 12, No. 10, October 2001
[2]Dongarra J., J. Du Croz, S. Hammarling, I. Duff, A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Soft., 16 (1), pp. 1-17, 1990.
[3]Dongarra J., D. Walker, Libraries for Linear Algebra, in Sabot G. W. (Ed.), High Performance Computing: Problem Solving with Parallel and Vector Architectures, Addison-Wesley Publishing Company, Inc., pp. 93-134, 1995.
[4]Furtado A., J. Souza, A. Rebouças, D. Rexachs, E. Luque, Architectures for an Efficient Application Execution in a Collection of HNOWS. In: D. Kranzlmüller et al. (Eds.):Euro PVM/MPI 2002, LNCS 2474, pp.450-460, 2002.

Home page


2004-02-15