G. Radons, G. Rünger, M. Schwind and H. Yang
Institute of Physics
and Department of Computer Science
of Technical University Chemnitz
Chemnitz, Germany
email: radons@physik.tu-chemnitz.de
In this paper we consider the calculation of the full spectrum of Lyapunov exponents and associated Lyapunov vectors for high-dimensional, continuous time nonlinear dynamical systems. Lyapunov vectors found much interest recently, because they can exhibit collective wave-like structures. The standard method of Bennetin et al. is employed to calculate these quantities for our many-particle (N=100-1000) Lennard-Jones system in d dimensions. A system of 2dN*2dN linear and 2dN nonlinear ordinary differential equations has to be integrated simultaneously in order to obtain the dynamics of 2dN offset vectors in tangent space and the reference trajectory in phase space, respectively.
For the calculation of the Lyapunov exponents and vectors the offset vectors have to be reorthogonalized periodically using either Gram-Schmidt orthogonalization or QR decomposition. To obtain scientifically useful results one needs particle numbers in the above range and long integration times for the calculation of certain long time averages. This enforces the use of parallel implementations of the corresponding algorithms. It turns out that the repeated reorthogonalization is the most time consuming part of the algorithm. The algorithmic challenge for the parallelization lies on one hand on the parallel implementation of the reorthogonalization algorithm itself. On the other hand the results of the latter have to be fed back to the molecular dynamics integration routine, providing new initial conditions for the next reorthogonalization step, and so on. This alternation between reorthogonalization and molecular dynamics integration requires in addition an optimal balancing of computation and communication on distributed memory machines or cluster of SMPs.
As parallel reorthogonalization procedure we have realized and tested several parallel versions of Gram-Schmidt orthogonalization and of QR factorization based on blockwise Householder reflection. The parallel version of classical Gram-Schmidt (CGS) orthogonalization is enriched by a reorthogonalization test which avoids a loss of orthogonality by dynamically using iterated CGS. All parallel procedures are based on a 2-dimensional logical processor grid and a corresponding block-cyclic data distribution of the matrix of offset vectors. Row-cyclic and column-cyclic distributions are included due to parametrized block sizes, which can be chosen appropriately. Special care was also taken to offer a modular structure and the possibility for including efficient sequential basic operations, like from BLAS, in order to efficiently exploit the processor or node architecture. For comparison we consider standard library routines like QR factorization from ScaLAPACK.
The interface between the parallel reorhogonalization and the integration procedure guarantees an invariant data distribution, which may require an automatic redistribution, so that different parallel orthogonalizations can be used within the integration process to adapt the parallel efficiency to the specific hardware properties. Performance tests are performed on a Beowulf cluster, a cluster of dual Xeon nodes, and an IBM SP4 at NIC, Juelich. Our emphasis is on a flexible combination of parallel reorthogonalization procedures and molecular dynamics integration that allows an easy restructuring of the parallel program. The goal is to exploit the characteristics of processors or nodes and of the interconnections network of the parallel hardware effectively by choosing an efficient combination of basic routines and parallel orthogonalization algorithm, so that computation of Lyapunov spectrum and Lyapunov vectors can be performed in the most efficient way in order to be able to simulate very large problems.