CS594 - Performance Optimization - Homework

Due at beginning of class on Wed., April 28

This assignment is to be done on the UTK SP2.

  1. Run the cpu command (complete path /export/usr/bin/cpu) on both a high node and a thin node on the UTK SP2 and answer the following questions for each:
    1. What type of processor?
    2. How many processors?
    3. How many caches, what kind, and what size?

  2. Read the xlf man page. Based on your answer to the third part of the preceding question, what would you specify for the -qcache option for
    1. the high nodes?
    2. the thin nodes?

  3. Change the following nested loop example so as to have stride one access and still produce the same result:

      do i = 1,99
        do j = 2,100
          a(i,j) = a(i+1,j-1)*2.0
        enddo
      enddo
    

  4. Consider the following two nested loops for performing matrix multiplication:

      do i=1,n
        do k=1,n
          do j=1,n
            c(j,i)=c(j,i)+a(j,k)*b(k,i)
          enddo
        enddo
      enddo
    

      do i=1,n
        do j=1,n
          do k=1,n
            c(j,i)=c(j,i)+a(j,k)*b(k,i)
          enddo
        enddo
      enddo
    

    1. Calculate approximately the value of n for which all the arrays will fit into cache (be sure to specify whether this is for a high node or a thin node).
    2. Code the above loops as two separate routines. Then write a main program that initializes n and gives random values to the a and b arrays and then calls the multiplication routines. Use double precision. Also add a call to the BLAS Level 3 routine DGEMM from ESSL to perform the matrix multiplication. Use a range of values for n from fitting in cache to overflowing the cache.
    3. Compile for gprof without any optimization and and use gprof to report the times taken for each of the two routines and DGEMM for the different values of n.
    4. Compile with -O3 -qhot and -qreport=hotlist and rerun. Report the times taken by the routines. Look at the compiler report and report what blocking factors it chose for the arrays.

browne@cs.utk.edu