Assignment 5

Parallel Conjugate Gradient

 

Due on March 22nd, 2000

 

 

Conjugate Gradient Method

 

There are many techniques for solving systems of linear equations, i.e., solving for x in A*x = b. When the system is dense, the routines in the LAPACK or ScaLAPACK libraries are appropriate. It is often the case that the system is sparse, because every variable in system depends upon only a small number of other variables. We will be covering various techniques for solving sparse systems in lecture. The Conjugate Gradient method, called CG for short, is suitable for solving any linear system where the coefficient matrix A is both symmetric, i.e. A = A', and positive definite. Three equivalent definitions of positive definiteness are:

 

     all of A's eigenvalues are positive,

     x'*A*x > 0 for all nonzero vectors x, and

     an LU factorization of A of the form A = L*L' exists (the so-called Cholesky factorization).

 

CG is often used for sparse systems, because it only requires a single operation, matrix vector multiplication; it does not even require that you actually have A, but only that you have a function for multiplying by it.

 

Here is the CG algorithm:

 

     x = initial guess for inv(A)*b

     r = b - A*x   ... residual, to be made small

     p = r         ... initial "search direction"

     repeat

        v = A*p

            ... matrix-vector multiply

        a = ( r'*r ) / ( p'*v )

            ... dot product (BLAS1)

        x = x + a*p

            ... compute updated solution

            ... using saxpy (BLAS1)

        new_r = r - a*v

            ... compute updated residual

            ... using saxpy (BLAS1)

        g = ( new_r'*new_r ) / ( r'*r )

            ... dot product (BLAS1)

        p = new_r + g*p

            ... compute updated search direction

            ... using saxpy (BLAS1)

        r = new_r

     until ( new_r'*new_r small enough )

 

Here is a rough motivation for this algorithm. CG maintains 3 vectors at each step, the approximate solution x, its residual r=A*x-b, and a search direction p, which is also called a conjugate gradient. At each step x is improved by searching for a better solution in the direction p, yielding an improved solution x+a*p. This direction p is called a gradient because we are in fact doing gradient descent on a certain measure of the error (namely sqrt(r'*inv(A)*r)). The directions pi and pj from steps i and j of the algorithm are called conjugate, or more precisely A-conjugate, because they satisfy pi'*A*pj = 0 if i.ne.j. One can also show that after i iterations xi is the "optimal" solution among all possible linear combinations of the form:

 

    a0*x + a1*(A*x) + a2*(A^2*x) + a3*(A^3*x) + ... + ai*(A^i*x)

 

For most matrices, the majority of work is in the sparse matrix-vector multiplication v=A*p in the first step. The other operations in CG are easy to parallelize. The saxpy operations are embarrassingly parallel, and require no communication. The dot-products require local dot-products and then a global add using, say, a tree to form the sum in log(p) steps.

 

The rate of convergence of CG depends on a number kappa called the condition number of A. It is defined at the ratio of the largest to the smallest eigenvalue of A (and so is always at least 1). A roughly equivalent quantity is norm(inv(A))*norm(A), where the norm of a matrix is the magnitude of the largest entry. The larger the condition number, the slower the convergence. One can show that the number of CG iterations required to reduce the error by a constant g<1 is proportional to the square root of kappa. There are methods similar to CG that are suitable for matrices A that are not symmetric or positive definite. Like CG, most of their work involves computing the matrix vector product A*p (or possibly A'*p), as well as dot-products and saxpy's. For on-line documentation and software, including advice on choosing a method, Templates for the Solution of Sparse Linear Systems (http://www.netlib.org/linalg/html_templates/Templates.html )

 

 

The Assignment

 

Parallize conjugate gradient on the TORC cluster.

 

We will provide you with the matrices, but you also need the right-hand-side of the equation (the vector b). To do this, make up a random vector x, and multiply A*x to get b. The purpose of the code you'll be timing is to compute something close to x, given only A and b. Your code should terminate when the residual is small enough, i.e. when

    new_r'*new_r < e * b'*b

where the parameter eplison e should be a command-line argument of your program. Finally, it's your job to make sure that the results are correct.

 

Matrices and Matrix Format

 

Please use matrices from Matrix Market http://math.nist.gov/MatrixMarket/

There is software to help in reading in the matrices.

 

http://math.nist.gov/MatrixMarket/data/Harwell-Boeing/bcsstruc1/bcsstruc1.html

 

The Harwell/Boeing format is a compressed column format suitable for FORTRAN. This assignment is in C, where a compressed row format would be suitable. For this assignment, you can pretend that the columns are actually rows. Since we are not interpreting any results, and since the matrices are all square, you don't have to bother transposing the format. Be aware, however, that since the matrices are symmetric, only the main diagonal and lower-triangular portion are stored in the file.

 

 

What to hand in...

 

In html format, containing:

 

     A writeup of optimizations you tried, how they worked, and what you decided to use.

 

     A pointer to a directory containing all of your code and an executable program.

      

     An analysis of your code, both communication and computation of the main loop for the matrices that you tried. Give speedup plots for one small and one large matrix.