CS 594 -
Applications of Parallel Computing
Assignment 4
Due March 1st,
2000
For
this assignment, we will be coding an optimized memory-hierarchy-cognizant
matrix-matrix multiply routine. For simplicity, we will only require square
matrices (worrying about the non-square case, essential for a good library
code, can be a bit time-consuming). The goal of the assignment is to:
·
Write
a matrix routine C = C + A*B for square matrices and
·
Get
as close to peak performance as possible, while still getting the correct
results.
Rewrite
your matrix multiply using Strassen's method as discussed in class. Use the
manufactured version of DGEMM to perform the matrix multiply parts you will
need. Also compare the performance of your version of Strassen's matrix
multiply with the ATLAS version. Be sure that you include a verification that
you have the correct result.
Reading:
J. Dongarra, P. Mayes, G. Radicati, The IBM RISC System/6000 and Linear Algebra
Operations, UT, CS-90-122, December 1990.
http://www.netlib.org/lapack/lawns/lawn28.ps