Erik Elmroth and Johan Tordsson
Department of Computing Science and HPC2N
Umeå University
SE-901 87 Umeå, Sweden
emails: elmroth@cs.umu.se and tordsson@cs.umu.se respectively
This contribution presents algorithms, methods, and software for a grid resource manager, responsible for resource brokering and scheduling in early production grids. The broker selects computing resources based on actual job requirements and a number of criteria identifying the available resources, with the aim to minimize the total time to delivery for the individual application. The total time to delivery includes the time for program execution, batch queue waiting, input/output data transfer, executable staging, etc. Main features include support for making advance reservations, to make resource selections based on computer benchmark results and network performance predictions, and to enable a basic adaptation facility.
The reservation capability is vital for enabling co-allocation of resources in highly utilized grids. This feature also provides a guaranteed alternative to estimating the batch queue waiting time.
The performance differences between grid resources and the fact that their relative performance characteristics may vary for different types of applications makes resource selection difficult. This issue is handled by a benchmark-based procedure for resource selection. Based on the user's identification of relevant benchmarks and an estimated execution time on some specified resource, the broker estimates the execution time for all resources of interest. This requires that a relevant set of benchmark results are available from the resources' information systems. In order to facilitate the production, collection and publication of such results, our work includes developing a grid-resource benchmark suite.
As most grid resources still are available without performance and queue time guarantees, the adaptation feature of this resource manager is useful. It allows a user to request that the broker after an initial job submission strives to re-direct the job to a resource, likely to give a shorter total time to delivery.
A resource broker and scheduler is fundamental in any large-scale grid-environment for scientific applications. Our software is primary developed for NorduGrid, comprising nearly 30 parallel systems in 6 countries, and SweGrid, comprising 6 Swedish Linux clusters dedicated for grid usage with a total of 600 CPUs. Both NorduGrid and SweGrid are Globus-based production environments for 24 hour per day grid usage.