Matthias Korch and Thomas Rauber
Faculty of Mathematics, Physics, and Computer Science
University of Bayreuth
emails: {matthias.korch, rauber}@uni-bayreuth.de
The modeling of many scientific and engineering problems leads to systems of ordinary differential equations (ODEs). The solution of such problems by numerical integration methods requires large amounts of computational resources, particularly if the ODE system is large or the evaluation of the right-hand-side function of the system is expensive. Therefore, there is a demand for efficient parallel solution methods allowing a faster solution of such systems.
We aim at the development of new, fast realizations of parallel ODE solvers starting with an analysis of existing parallel and sequential methods. This analysis assists in the detection of scalability bounds and further sources of performance degradation. The paper considers the simulation-based analysis of embedded Runge-Kutta (RK) methods. In particular, we study locality optimizations for general sequential embedded RK methods based on program transformations similar to [1], and locality optimization and parallelization using pipelining to exploit the characteristical access structure of an important class of ODEs resulting from the spatial discretization of partial differential equations (PDEs) [2]. In contrast to [2], the parallel realizations are based on a multi-threaded realization using Pthreads.
Investigations on real computer systems--as presented in [1,2]--show that there are several limitations when using real systems in software analysis. For example, only a limited number of hardware events is measurable as provided by the manufacturer of the system, the execution of the program to be analyzed may be disturbed by other user processes competing for shared resources, and the measurement itself may influence its outcome if additional code needs to be introduced into the program or if the program has to be interrupted in order to read the state of the machine. Additional experiments using simulators can help overcome these problems. Particularly promising are instruction-set-level simulators which enable full-system simulation of real parallel architectures and provide a wide range of instrumentation facilities. Due to the low level of simulation, a behavior very close to real hardware can be obtained. Most importantly, a nonintrusive instrumentation of applications is possible while the full state of the machine is visible. Moreover, the influence of the hardware architecture on the performance of an application can be investigated by modifying the simulated hardware.
One simulator promising to provide all these features is Simics [3]. It has already been used successfully to investigate the cache memory behavior of PDE solvers [4]. The full paper uses Simics for analyzing the locality behavior and scalability properties of several sequential and parallel multi-threaded versions of embedded RK solvers. We present experiments performed on a simulated SPARC V9 ISA, and compare our results with experiments performed on a real Sun Fire server. As examples we use sparse ODE systems resulting from applying the method of lines to time-dependent PDEs, and dense ODE systems resulting from applying spectral methods to time-dependent PDEs. We show that the simulations can be used to guide program transformations and restructurings that improve the performance of the investigated methods.
References:
[1]. T. Rauber and G. Rünger. Optimizing locality for ODE solvers. In
Proc. of 15th ACM Int. Conf. on Supercomputing (ICS 2001), pages
123-132. ACM Press, June 2001.
[2]. M. Korch and T. Rauber. Scalable parallel RK solvers for ODEs
derived by the method of lines. In Euro-Par 2003. Parallel
Processing (LNCS 2790), pages 830-839. Springer, August 2003.
[3]. Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel
Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson,
Andreas Moestedt and Bengt Werner. Simics: A full system
simulation platform. Computer, 35(2):50-58, February 2002.
[4]. D. Wallin, H. Johansson and S. Holmgren. Cache memory behavior of
advanced PDE solvers. Technical Report 2003-044, Department of
Information Technology, Uppsala University, August 2003. A short
version of this paper will appear in the proceedings of Parallel
Computing 2003 (ParCo 2003).