PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)
Updated: 20 February 2004
Optimization of a Sparse Hypermatrix Cholesky Factorization
Jose R. Herrero and Juan J. Navarro
Universitat Politecnica de Catalunya
Spain
Abstract:
We will present our work on a sparse Cholesky factorization based on the
Hypermatrix [1,2] data structure. The matrix is partitioned recursively
into blocks of different sizes. The HM structure consists of several
(N) levels of submatrices. The top N-1 levels hold pointer matrices
which point to the next lower level submatrices. Only the last (bottom)
level holds data matrices. Data matrices are stored as dense matrices
and operated as such. Null pointers in pointer matrices indicate that
the corresponding submatrix does not have any non-zero elements and is
therefore unnecessary.
Working on dense matrices has several advantages. First, it simplifies
the data structure. Second, it allows for the usage of level 3 BLAS
routines. However, it also has one disadvantage, namely the storage and
computation on zero elements. We have been trying to reduce the large
overhead introduced by such zeros. In a recent paper [3], we showed how
we could improve performance by producing very efficient operations
working on small matrices. Thus, the block size could be reduced while
the effective performance increased.
We have also managed to reduce the number of zeros actually used within
a data submatrix. Dimensions of a data submatrix are determined by the
dimensions of two diagonal blocks: the one in the same row and the one
in the same column. However, a submatrix is not necessarily full. To
avoid using the whole data submatrix we have modified the original way
of storing data submatrices as dense matrices. Now we can keep and
operate on a subset of a data submatrix specified by the top-left and
bottom-right corners. Using this, we have managed to outperform a
classical left-looking supernode-supernode algorithm [4] for medium and
large matrices.
However, performance is still worse than that of a 2D sparse Cholesky as
implemented in routine PSLDLT on an R10000 processor [5]. We will
analyze the reasons and propose ways to improve our sequential code.
References:
Fuchs, G., Roy, J., Schrem, E.:
Hypermatrix solution of large sets of symmetric positive-definite
linear equations.
Comp. Meth. Appl. Mech. Eng. (1972) 197-216
Ast, M., Fischer, R., Manz, H., Schulz, U.:
PERMAS: User's reference manual.
INTES publication no. 450, rev.d (1997)
Herrero, J.R., Navarro, J.J.:
Improving Performance of Hypermatrix Cholesky Factorization.
Euro-Par 2003,LNCS2790, Springer-Verlag (2003) 461-469
Liu, J.W., Ng, E.G., Peyton, B.W.:
Block sparse Cholesky algorithms on advanced uniprocessor computers.
SIAM J. Sci. Comput. (1993) 1034-1056
Rothberg, E., Gupta, A.:
An efficient block-oriented approach to parallel sparse Cholesky
factorization.
SIAM J. Sci. Comput. (1994) 1413-1439
Home page
Jerzy Wasniewski
2004-02-20