CS 594 -
Applications of Parallel Computing
Assignment 6
Due
For
this assignment, we will be coding an optimized memory-hierarchy-cognizant
matrix-matrix multiply routine. For simplicity, we will only require square
matrices (worrying about the non-square case, essential for a good library
code, can be a bit time-consuming). The goal of the assignment is to:
·
Write a matrix routine C = C + A*B for square matrices and
·
Get as close to peak performance as possible, while still getting the
correct results (I would like you to verify you are getting the correct
results).
Rewrite
your matrix multiply using Strassen's method as
discussed in class. Use the manufactured version of DGEMM to perform the matrix
multiply parts you will need. Also compare the performance of your version of Strassen's matrix multiply with the ATLAS version. Be sure
that you include verification that you have the correct result.
Describe
how you would use a parallel computer to do this operation.
1. J. Dongarra, P. Mayes, G. Radicati, The IBM RISC System/6000
and Linear Algebra Operations, UT, CS-90-122, December 1990. http://www.netlib.org/lapack/lawns/lawn28.ps