In this paper we present a scalable protocol for conducting periodic probes of network performance in a way that minimizes collisions between separate probes. The goal of the protocol is to enable active performance monitoring of large-scale distributed computational systems and networks. We use the protocol to generate time series of measurement data that are then exposed to numerical forecasting models when a prediction of network performance is required. We present the protocol and demonstrate its effectiveness using the Network Weather Service --- a tool for dynamically predicting network, CPU, memory, and storage performance.
As research and implementation continue to facilitate high performance computing in Java, applications can benefit from resource management and prediction tools. In this work, we present such a tool for network round trip time and bandwidth between a user's desktop and any machine running a web server. JavaNws is a Java implementation and extension of a powerful subset of the Network Weather Service (NWS), a performance prediction toolkit that dynamically characterizes and forecasts the performance available to an application. However, due to the Java language implementation and functionality (portability, security, etc), it is unclear whether a Java program is able to measure and predict the network performance experienced by C-applications with the same accuracy as an equivalent C program. We provide a quantitative equivalence study of the Java and C TCP-socket interface and show that the data collected by the JavaNws is as predictable as, that collected by the NWS (using C).
Providing a ubiquitous service that can both track dynamic performance changes and remain stable in spite of them requires adaptive programming techniques, an architectural design that supports extensibility, and internal abstractions that can be implemented efficiently and portably. In this paper, we describe the current implementation of the Network Weather Service for Unix and TCP/IP sockets and provide examples of its performance monitoring and forecasting capabilities.
In this paper, we focus on the problem of making short and medium term forecasts of CPU availability on time-shared Unix systems. We evaluate the accuracy with which availability can be measured using Unix load average, the Unix utility \verb+vmstat+, and the Network Weather Service CPU sensor that uses both. We also examine the autocorrelation between successive CPU measurements to determine their degree of self-similarity. While our observations show a long-range autocorrelation dependence, we demonstrate how this dependence manifests itself in the short and medium term predictability of the CPU resources in our study.
Dynamically Forecasting Network Performance to Support Dynamic Scheduling Using the Network Weather Service (short version) (compressed postscript) Rich Wolski, in the Proceedings of the 6th High-Performance Distributed Computing Conference, August, 1997.
The Network Weather Service is a generalizable and extensible facility designed to provide dynamic resource performance forecasts in metacomputing environments. In this paper, we outline its design and detail the predictive performance of the forecasts it generates. While the forecasting methods are general, we focus on their ability to predict the TCP/IP end-to-end throughput and latency that is attainable by an application using systems located at different sites. Such network forecasts are needed both to support scheduling, and by the metacomputing software infrastructure to develop quality-of-service guarantees.
We describe the architecture of the Network Weather Service and implementations that we have developed and are currently deploying for the Legion and Globus/Nexus metacomputing infrastructures. We also detail NWS forecasts of resource performance using both the Legion and Globus/Nexus implementations. Our results show that simple forecasting techniques substantially outperform measurements of current conditions (commonly used to gauge resource availability and load) in terms of prediction accuracy.
In this paper, we investigate G-commerce --- computational economies for controlling resource allocation in Computational Grid settings. We define hypothetical resource consumers (representing users and Grid-aware applications) and resource producers (representing resource owners who ``sell'' their resources to the Grid). We then measure the efficiency of resource allocation under two different market conditions: commodities markets and auctions. We compare both market strategies in terms of price stability, market equilibrium, consumer efficiency, and producer efficiency. Our results indicate that commodities markets are a better choice for controlling Grid resources than previously defined auction strategies.
This paper has been submitted to IPDPS00.
The thesis of this proposal is that Grid resource allocation is best structured as a market economy, where prices are dynamically affixed to resources according to their supply and demand. We call term research G-Commerce, since we apply economic principles to Grid computing, as e-commerce applies economic principles to Internet computing. Resource prices are determined by a distributed pricing entity that we term The First Bank of G. This entity dynamically monitors resource supply and demand, and sets prices based on current levels so that supply matches demand (prices are fair) and the overall allocation of resources is stable. This combination of fair price determination and system-wide stability differentiates G-Commerce from auction-based approaches to the brokering of resources.
he Computational Grid has recently been proposed for the implementation of high-performance applications using widely dispersed computational resources. The goal of a Computational Grid is to aggregate ensembles of shared, heterogeneous, and distributed resources (potentially controlled by separate organizations) to provide computational ``power'' to an application program.
In this paper, we provide a toolkit for the development of Grid applications. The toolkit, called EveryWare, enables an application to draw computational power transparently from the Grid. The toolkit consists of a portable set of processes and libraries that can be incorporated into an application so that a wide variety of dynamically changing distributed infrastructures and resources can be used together to achieve supercomputer-like performance. We provide our experiences gained while building the EveryWare toolkit prototype and the first true Grid application.
This paper investigates the efficacy of Application-Level Scheduling (AppLeS) for a parallel gene sequence library comparison application in production metacomputing settings. We compare an AppLeS-enhanced version of the application to an original implementation designed and tuned to use the native scheduling mechanisms of Mentat -- a metacomputing software infrastructure. The experimental data shows that the AppLeS versions outperform the best Mentat versions over a range of problem sizes and computational settings.
In this paper, we define a set of principles underlying application-level scheduling and describe our work-in-progress building AppLeS (application-level scheduling) agents. We illustrate the application-level scheduling approach with a detailed description and results for a distributed 2D Jacobi application on 2 heterogeneous platforms.
While running on parallel distributed resources, schedulers may find it advantageous to redistribute elements of a computation in response to changing conditions. In this paper, we focus on the development of dynamically parametrizable models to determine the cost (in terms of execution delay) of performing redistribution. We illustrate our approach by examining in detail the modeling of redistribution costs for a 2D Jacobi application running in a cluster of workstations environment.
We focus om the problems of scheduling applications on metacomputing systems. We intoduce the concept of application-centric scheduling in which everything about the system is evaluated in terms of its impact on the application. Application-centric scheduling is used by virtually all metacomputer programmers to achieve performance on metacomputing systems. We describe two successful metacomputing appli and Panoramacations to illustrate this approach, and describe AppLeS scheduling agents which generalize the application-centric scheduling approach. Finally, we show preliminary results which compare AppLeS-dervied schedules with conventional strip and blocked schedules for a two-dimensional Jacobi code.
Many parallel compilation systems represent programs internally as Directed Acyclic Graphs (DAGs). However, the storage of these DAGs be In this paper we describe a compile-time scheduling methodology for hierarchical DAG programs represented in the IFX intermediate form. The method we present is itself hierarchical reducing the storage that would otherwise be required by a single flat DAG representation. We describe the scheduling model and demonstrate the method using the Optimizing Sisal Compiler and two scientific applications.
We describe Zoom, a hierarchical representation in which heterogeneous applications can be described. The goal of Zoom is to provide an abstraction that computer and computational scientists can use to describe heterogeneous applications, and to provide a foundation from which program development tools for heterogeneous network computing can be built. Three levels (structure, implementation and data) of the Zoom hierarchy are described and are used to illustrate two heterogeneous applications. Extensions to Zoom to include additional resource parameters required by program development tools are also discussed.
We couple the Zoom representation designed to facilitate development of heterogeneous applications, and the HeNCE graphical language and tool, designed as a representation for and an executional model of heterogeneous programs targeted to PVM. The combination of Zoom and HeNCE provides a hierarchical representation which exposes performance issues and a means of automatically translating that representation into code executable on heterogeneous networks of computers.
The cost of hardware cache-coherence, both in terms of execution delay and operational cost, is substantial for scalable systems. Fortunately, compiler generated cache management can reduce program serialization due to cache-contention and increase execution performance. It can also reduce the cost of parallel systems by eliminating the need for more expensive hardware support. In this paper, we use Sisal functional language system as a vehicle to implement and investigate automatic, compiler based cache management. We describe our implementation of Sisal for the IBM Power/4. The Power/4, briefly available as a product, represents an early attempt to build a shared-memory machine that relies strictly on the language system for cache-coherence. We discuss the issues associated with deterministic execution and program correctness on a system without hardware coherence, and demonstrate how Sisal (as a functional language) is able to address those issues.
Compiler Enforced Cache Coherence Using a Functional Language Rich Wolski and David Cann, Journal of Scientific Programming, December, 1995.
A revised version of the conference paper whcih includes a discussion of imperative compilation techniques for the Power/4.
Programming languages are the most important tool at a programmer's disposal. All other tools correct, visualize, or evaluate the product crafted by this tool. The advent of multiprocessor computer systems has greatly complicated the programmer's task and increased his need for high-level languages capable of automatically taming these architectures. In this paper, we describe a prototype implementation of Sisal for multiprocessor, hierarchical-memory systems. The implementation includes explicit compiler and runtime control that effectively exploits the different levels of memory and manages interprocess communications (IPC). We give preliminary performance results for this system on the BBN TC2000.