Homework
10:
Multivariate
Statistical Analysis of Hardware Counter Data
For this assignment you will use the MATLAB 7/Release 14
Statistics Toolbox to do Principal Components Analysis of hardware counter data
from a parallel ocean modeling application called HYCOM.
MATLAB Release 14 is installed on TORC. To run it, set your DISPLAY environment
variable and add /usr/local/matlab/bin to your path. Then simply type “matlab &”.
See the online help under Statistics Toolbox/Multivariate Statistics for
an explanation and demo of Principal Components Analysis.
You will need the following files:
- HYCOMcounters.csv – CSV (Comma Separated
Value) file with hardware counter data for 124 processes for 8 different
hardware counter metrics (native IBM POWER4 events).
- HYCOMcountersfull.csv – same as
HYCOMcounters.csv except with human-readable names of the counter events
added
- POWERevents.csv – CSV file with descriptions of
IBM POWER native events.
- MATLAB
M-files with definitions of convenience functions:
Please do the following:
- Use
Import Data on the MATLAB File menu to import the data from
HYCOMcounters.csv and use the buildMatrix function to convert the data to
a 124 x 8 array with the processes as the rows and the hardware counter
metrics as the columns.
- Perform
Principal Components Analysis on the hardware counter data and answer the
following:
- Which
counters contribute the most to the first three principal components?
- What
percent of variability in the data is explained by the first three
principal components?
- You
may either use the MATLAB Statistics Toolbox functions directly or use the
convenience functions in the above M-files (see lecture from April 13 for
an example of how to use these) to do the Principal Components Analysis.
- Please
turn in a script of the MATLAB commands you used to do the Principal
Components Analysis, your answers to the above questions, and any
additional observations.
Due by April 27 before class