CS 594 - Applications of Parallel Computing

Assignment 3

Due February 17th, 1999

 

Implement, in Fortran or C, the six different ways to perform matrix multiplication by interchanging the loops. (Use 64-bit arithmetic.) Make each implementation a subroutine, like:

subroutine ijk ( a, m, n, lda, b, k, ldb, c, ldc )

subroutine ikj ( a, m, n, lda, b, k, ldb, c, ldc )

...

Construct a driver program to generate random matrices and calls each matrix multiply routine with square matrices of orders 50, 100, 150, 200, …, 500, timing the calls and computing the Mflop/s rate.

Run your program on at least one RISC based architecture. A few you can try are:

austin.cs.utk.edu

IBM RS/6000, 41 MHz, max rate: 82 Mflop/s, cache size: 64 KB, Fortran: xlf –O3 -lessl

ig.cs.utk.edu

DEC Alpha, 266 MHz, max rate: 532 Mflop/s, cache size:

nala.cs.utk.edu

SUN Ultra 2, 200 MHz, max rate: 400 Mflop/s, cache size:

Use the highest level of optimization. Include in your timing routine a call to the following system supplied

call dgemm('No', 'No', n, n, n, 1.0d0, a, lda, b, ldb,1.0d0, c, ldc )

(This is a routine provided in the ESSL Library for computing matrix multiply.)

 

Write-up a description of the timing and describe why the routines perform as they do. Include in your write-up ideas on how to make the matrix multiplication even faster.

You can call routine SECOND to collect the execution time. The source for routine SECOND can be found in ~dongarra/CS594/second.f.